You're asking multiple questions, but I think the main one is:
Is there any standard dictating what it must contain?
To my knowledge, no.
Given that, your related questions:
How is this list generated? Are its contents the same across different Unices?
are answered “it depends on each different Unix”.
The convention of including a word list as part of the operating system comes from the spell(1) utility, which uses it for a primitive spell-checking procedure.
That spell-checking procedure is described in the academic paper “Development of a Spelling List”, by M. D. McIlroy of Bell Labs, 1982.
You should check your operating system's package manager for where the spelling list comes from, how it is generated, and what alternatives are available.
On Debian GNU+Linux, for example:
- The
/usr/share/dict/words file is a symbolic link managed using the Debian “alternatives” system.
- A common word list package providing that link is the
wamerican package.
- The package documentation for
wamerican states its word list comes from the SCOWL (Spell Checker Oriented Word Lists) project.
Many other word list packages can be installed; they each have the “Provides: wordlist” field:
$ aptitude search '?provides(wordlist)' | wc -l
34
On different Unices, you'll need to see the package system and documentation to know the provenance and alternatives of the word list.