| Volume | |||
| Total Wordsspace-delimited tokens | 67,707 | ~11,200 | |
| Total Charactersincl. spaces & punctuation | 363,955 | ~63,000 | |
| Total Charactersletters only, no spaces | 296,660 | ~51,800 | |
| Total Syllablesvowel-cluster algorithm ±2% | 89,293 | ~18,200 | |
| Density | |||
| Avg Syllables / Word | 1.319 | 1.627 | |
| Avg Letters / Wordletter characters only | 4.38 | 4.62 | |
| Structure | |||
| Books | 3 | 3 | — |
| Chapters | 19 | 19 | — |
Original text: Hemingway, Ernest. The Sun Also Rises. New York: Charles Scribner's Sons, 1926. Text via Project Gutenberg Canada #1257. Word count 67,707 is the standard verified count across all Scribner's / Gutenberg reproductions (range across counters: 66,800–68,200). Character and syllable counts computed from calibrated multi-chapter sample (n=1,148 tokens).
Pidgin text: Newlang pidgin compression of the full novel. All 19 chapters covered. Stats estimated from calibrated sample (n=826 tokens) extrapolated to full document (~63,000 chars). Syllables computed with same vowel-cluster algorithm as original; margin of error ±3% on projected totals.
Density inversion: Despite overall compression, avg syllables/word is higher in the pidgin (+23%). This is structurally expected: pidgin drops short English function words (the, a, of, in, and) and replaces them with single-char particles (e, a, na, de) which score as 1-syllable words but shorter in character count, while retaining polysyllabic content words (proper nouns, verbs, nouns). The net phonological load per retained word is therefore higher even as total volume collapses.
TJID3 Research · Cleveland, Ohio · T.M. Jones, Ph.D. · ORCID 0000-0001-7372-6345
| Letter | IPA | Sources | Notes |
|---|---|---|---|
| a | /a/ | universal | Open central |
| e | /e/ | romancegermanicslavic | Mid front unrounded |
| i | /i/ | universal | High front |
| o | /o/ | universal | Mid back rounded |
| u | /u/ | universal | High back rounded |
| u· | /y/ | germanicromancemandarin | High front rounded (optional) |
| Particle | Function | Source | Example |
|---|---|---|---|
| wa | copula / completed past / assertion | slavic | ta wa lao — he was old |
| shi | copula / present state | mandarin | ta shi lao — he is old |
| le | perfective aspect | mandarin | ta le pesk — he has fished |
| ma | question / uncertainty | mandarinarabic | ta ma shi lao? — is he old? |
| bi | duration / sustained state | slavic | ta bi sol — he has long been alone |
| li | conditional / softening | romance | ta li ven — he might come |
| da | emphasis / contradiction | slavicarabic | da ta sol — he is alone |
| om | inevitability / grief / sea | hindiarabic | om golf strem — the Gulf Stream (as fate) |
| Particle | Function | Source |
|---|---|---|
| ke | relative clause marker | arabicmandarin |
| na | locative (in/on/at) | slavicswahili |
| de | genitive / of | romance |
| e | and / also | romance |
| bez | without / negation | slavic |
| ima | now / current | japanese |
| cho | who / that (relative) | slavic |