Search | Navigation

Moby Project

The Moby Project is a collection of public-domain lexical resources. It was created by Grady Ward. The resources were dedicated to the public domain, and are now mirrored at Project Gutenberg. As of 2007[update], it contains the largest free phonetic database, with 177,267 words and corresponding pronunciations.

Contents


Hyphenator

The Moby Hyphenator II contains 187,175 hyphenated words, with 9,752 indicating that they should not be hyphenated. Hyphenation is indicated by a character value 165 (hex A5).

Language

Moby Language II contains wordlists of five languages - French, German, Italian, Japanese, and Spanish:

LanguageWordsSize (in bytes)
French138,2571,524,757
German159,8092,055,986
Italian60,453561,981
Japanese115,523934,783
Spanish86,059850,523
Total560,1015,928,030

However, some of the lists are contaminated, for example the Japanese list contains English words such as abnormal and non-words such as abcdefgh and m,./.

Part-of-Speech

Moby Part-of-Speech contains 233,356 words fully described by part(s) of speech, listed in priority order. The format of the file is word\parts-of-speech, with the following parts of speech being identified:

Part-of-speechCode
NounN
Pluralp
Noun phraseh
Verb (usually participle)V
Transitive verbt
Intransitive verbi
AdjectiveA
Adverbv
ConjunctionC
PrepositionP
Interjection !
Pronounr
Definite articleD
Indefinite articleI
Nominativeo

Pronunciator

The Moby Pronunciator II contains 177,267 words with corresponding pronunciations. The Project Gutenberg distribution also contains a copy of the cmudict v0.3. The file follows the format word[/part-of-speech] pronunciation. The part-of-speech field is used to disambiguate 770 of the words which have differing pronunciations depending on their part-of-speech. For example for the words spelled close, the verb has the pronunciation /ˈkloʊz/, whereas the adjective is /ˈkloʊs/. The parts-of-speech have been assigned the following codes:

Part-of-speechCode
Nounn
Verbv
Adjectiveaj
Adverbav
Interjectioninterj

Following this is the pronunciation. Several special symbols are present:

SymbolMeaning
/Used to separate phonemes
_Used to separate words
' Primary stress on the following syllable
, Secondary stress on the following syllable

The rest of the symbols are used to represent IPA characters, according to the following table:

SymbolIPA
&æ
-ə
@ʌ, ə
@rɜr, ər
Aɑː
aI
Arɑr
AU
bb
dd
Dð
dZ
Eɛ
eI
ff
gɡ
hh
hwhw
i
Iɪ
jj
kk
ll
mm
nn
Nŋ
Oɔː
Oiɔɪ
oU
pp
rr
ss
Sʃ
tt
Tθ
tS
u
Uʊ
vv
ww
zz
Zʒ

Shakespeare

Moby Shakespeare contains the complete unabridged works of Shakespeare. This specific resource is not available from Project Gutenberg.

Thesaurus

The Moby Thesaurus II contains 30,260 root words, with 2,520,264 synonyms and related terms - an average of 83.3 per root word. Each line consists of a list of comma-separated values, with the first term being the root word, and all following words being related terms.

Grady Ward placed this thesaurus in the public domain in 1996. It is also available as a Debian package.

Words

Moby Words II is the largest wordlist in the world.[1] The distribution consists of the following 16 files:

FilenameWordsDescription
ACRONYMS.TXT6,213Common acronyms and abbreviations
COMMON.TXT74,550Common words present in two or more published dictionaries
COMPOUND.TXT256,772Phrases, proper nouns, and acronyms not included in the common words file
CROSSWD.TXT113,809Words included in the first edition of the Official Scrabble Players Dictionary
CRSWD-D.TXT4,160Additions to the Official Scrabble Players Dictionary in the second edition
FICTION.TXT467A list of the most commonly occurring substrings in the book The Joy Luck Club
FREQ.TXT1,000Most frequently occurring words in the English language, listed in descending order
FREQ-INT.TXT1,000Most frequently occurring words on Usenet in 1992, listed with corresponding percentage in decreasing order
KJVFREQ.TXT1,185Most frequently occurring substrings in the King James Version of the Bible, listed in descending order
NAMES.TXT21,986Most common names used in the USA and Great Britain
NAMES-F.TXT4,946Common English female names
NAMES-M.TXT3,897Common English male names
OFTENMIS.TXT366Most common misspelled English words
PLACES.TXT10,196Place names in the USA
SINGLE.TXT354,984Single words excluding proper nouns, acronyms, compound words and phrases, but including archaic words and significant variant spellings
USACONST.TXT7,618 United States Constitution including all amendments current to 1993
Total863,149

References

External links


[1] Search
[2] All Pages
[3] Random article
powered by Sevenval