Tabula · translator

English Canon

Tabula · English Canon

T. M. Jones, Ph.D. · DOI: 10.5281/zenodo.19039226

tone wa past shi present le perfective ma question bi duration li conditional da emphasis om fate grammar ke relative na locative de genitive e and bez without ima now cho that
pidgin v2 · no diacritics · particles preverbal English
Hemingway question conditional emphasis om / fate perfective Mark Twain Genesis Austen Melville

word breakdown

canon benchmark · C. Dickens - A Tale of Two Cities

Dickens chars Dickens sylls Pidgin chars Pidgin sylls
# Dickens Pidgin Chars (E) Chars (P) Sylls (E) Sylls (P) ΔC ΔS

canon benchmark ·H. Melvill - Moby-Dick, Chapter 1 — Loomings

Melville chars Melville sylls Pidgin chars Pidgin sylls
# Melvill Pidgin Chars (M) Chars (P) Sylls (M) Sylls (P) ΔC ΔS

canon benchmark ·K. Vonnegut - Slaughterhouse-Five, Chapter 1

Vonnegut chars Vonnegut sylls Pidgin chars Pidgin sylls
# Vonnegut Pidgin Chars (V) Chars (P) Sylls (V) Sylls (P) ΔC ΔS

canon benchmark ·C. Dickens - A Tale of Two Cities — vocabulary frequency

Dickens English Pidgin
English word Pidgin word E freq P freq Chars E / P Sylls E / P Δ chars Δ sylls

The Sun Also Rises — E. Hemingway (1926)

Metric
Hemingway
Original
Pidgin
Compression
Δ Retention
Volume
Total Wordsspace-delimited tokens 67,707 ~11,200
~16.5%
Total Charactersincl. spaces & punctuation 363,955 ~63,000
~17.3%
Total Charactersletters only, no spaces 296,660 ~51,800
~17.5%
Total Syllablesvowel-cluster algorithm ±2% 89,293 ~18,200
~20.4%
Density
Avg Syllables / Word 1.319 1.627
+23%
Avg Letters / Wordletter characters only 4.38 4.62
+5%
Structure
Books 3 3
Chapters 19 19
~17%Word count reduction
~20%Syllable count reduction
~17%Character count reduction

Original text: Hemingway, Ernest. The Sun Also Rises. New York: Charles Scribner's Sons, 1926. Text via Project Gutenberg Canada #1257. Word count 67,707 is the standard verified count across all Scribner's / Gutenberg reproductions (range across counters: 66,800–68,200). Character and syllable counts computed from calibrated multi-chapter sample (n=1,148 tokens).

Pidgin text: Newlang pidgin compression of the full novel. All 19 chapters covered. Stats estimated from calibrated sample (n=826 tokens) extrapolated to full document (~63,000 chars). Syllables computed with same vowel-cluster algorithm as original; margin of error ±3% on projected totals.

Density inversion: Despite overall compression, avg syllables/word is higher in the pidgin (+23%). This is structurally expected: pidgin drops short English function words (the, a, of, in, and) and replaces them with single-char particles (e, a, na, de) which score as 1-syllable words but shorter in character count, while retaining polysyllabic content words (proper nouns, verbs, nouns). The net phonological load per retained word is therefore higher even as total volume collapses.

TJID3 Research · Cleveland, Ohio · T.M. Jones, Ph.D. · ORCID 0000-0001-7372-6345

Phoneme Table

v2 design changes: Diacritics removed entirely — á à ā ã ạ all retired. Tone and register now encoded by preverbal particles (see § Tone Particles below). Oceania (Māori, Hawaiian) and Australian Aboriginal (Warlpiri) removed; inventory stabilizes. Phoneme count reduced by 9; syllable-onset clusters simplified. The Latin script now reads cleanly without keyboard support.

Vowels

LetterIPASourcesNotes
a /a/ universal Open central — the bedrock vowel across all retained families
e /e/ romance germanic slavic Mid front unrounded
i /i/ universal High front — appears in all retained source families
o /o/ universal Mid back rounded
u /u/ universal High back rounded
/y/ germanic romance mandarin High front rounded (German ü, French u, Mandarin ü). Written — raised dot suffix replaces umlaut. Optional; yields to plain u in fast speech.

Consonants — Stops & Affricates

LetterIPASourcesNotes
p /p/ universal Bilabial voiceless
b /b/ universal Bilabial voiced
t /t/ universal Alveolar voiceless
d /d/ universal Alveolar voiced
k /k/ universal Velar voiceless — replaces both c/k variants from v1
g /g/ universal Velar voiced
ch /tʃ/ slavic arabic swahili Affricate — digraph, no new letter needed

Consonants — Fricatives

LetterIPASourcesNotes
f /f/ germanic arabic romance Labiodental voiceless
v /v/ slavic germanic romance Labiodental voiced
s /s/ universal Alveolar sibilant voiceless
z /z/ slavic arabic germanic Alveolar sibilant voiced
sh /ʃ/ arabic slavic germanic Postalveolar voiceless — digraph replaces š from v1
zh /ʒ/ slavic romance Postalveolar voiced (French je, Russian ж) — digraph replaces ž
kh /x/ arabic slavic germanic Velar fricative (Arabic خ, Ukrainian х, German ch). Digraph replaces x/χ from v1.
h /h/ arabic germanic japanese Glottal fricative

Consonants — Nasals, Liquids, Approximants

LetterIPASourcesNotes
m /m/ universal Bilabial nasal
n /n/ universal Alveolar nasal
ng /ŋ/ swahili mandarin japanese Velar nasal — word-final only. Digraph.
l /l/ universal Lateral approximant
r /r/ slavic romance arabic Trill or tap — realization varies by speaker community
y /j/ germanic slavic quechua Palatal approximant (German j, Slavic й)
w /w/ swahili arabic ojibwe Labial-velar approximant

Tone & Register — Particle System

Design rationale: Version 1 encoded tone via diacritics (à è ì = falling/assertion; á é í = rising/question; ā ē ī = duration; etc.). This required keyboard support and burdened every content word. Version 2 moves all register and modality marking to preverbal particles — a single invariant word placed before the predicate. The content words are now unmarked Latin. A speaker types or writes with a standard keyboard; a reader processes register from position, not from marks above letters.

Precedent: Mandarin aspect particles (了 le, 过 guo), Slavic aspect pairs, Japanese sentence-final particles (か ka, ね ne, よ yo). This pidgin follows the same logic but moves particles to preverbal position for cross-family legibility.
Particle Function Source Replaces (v1) Example
wa copula / completed past / assertion slavic wà (falling) ta wa lao — he was old
shi copula / present state mandarin shi (flat) ta shi lao — he is old
le perfective aspect marker mandarin lé (flat) ta le pesk — he has fished / finished fishing
ma question / uncertainty / rising mandarinarabic rising diacritic ta ma shi lao? — is he old?
bi duration / sustained state / long slavic macron diacritic ta bi sol — he has long been alone
li conditional / hesitation / softening romance dipping diacritic ta li ven — he might come
da emphasis / contradiction / creaky slavicarabic tilde diacritic da ta sol — he is alone (emphatic)
om inevitability / grief / the sea / heavy finality hindiarabic heavy dot diacritic om golf strem — the Gulf Stream (as fate)

Grammar Particles

ParticleFunctionSource
ke relative clause marker arabicmandarin
na locative (in / on / at) slavicswahili
de genitive / of romance
e and / also romance
bez without / negation slavic
ima now / current japanese
cho who / that (relative) slavic

Calibration Sample

E. Hemingway — The Old Man and the Sea (opening)
"He was an old man who fished alone in a skiff in the Gulf Stream, and he had gone eighty-four days now without taking a fish."
Ta wa lao vir ke pesk-a sol na shkaf na Golf Strem, e le bi chit okto-das-arba yaum bez chap pesk.
ta = he · wa = copula/past · lao = old (Mandarin 老) · vir = man (Slavic) · ke = who/relative · pesk-a = fished (Romance pescar) · sol = alone (Romance) · na = in/locative · shkaf = skiff · Golf Strem = proper noun · e = and · le = perfective · bi chit = gone/duration · okto-das-arba = 84 (eight-ten-four) · yaum = days (Arabic يوم) · bez = without · chap = taking · pesk = fish
With tone particles active
"Is he still alone out there?"
Ma ta bi sol na da?
ma = question particle · ta = he · bi = duration/still · sol = alone · na = out there/locative · da = emphasis