Topological Atlas of Human Language

Irreversibility depth D* and harmonic cycles for 49 languages, measured via scalar Hodge decomposition on byte-level Markov-shift graphs. Data: Wikipedia (~300k bytes per language). Method: Jiang et al. 2011.

Harmonic decay f(D)

The harmonic fraction f(D) decays with context depth. All languages share the same exponential rate; they differ only in starting position.

f(D) decay curves for 10 selected languages

Running coupling g(D)

g(D) = fharm/(1 − fharm) on a log scale. Grey: all 49 languages. Colored: selected. Dashed red: g = 1 crossover.

Running coupling g(D) for 49 languages

D* by word order

SOV languages resolve directional asymmetry ~1.3 characters faster than SVO.

D* by word order typology

Cross-domain formality ladder

f(5) orders domains by how much sequential order matters.

Cross-domain formality ladder

PCA on 8 topological invariants

Principal component analysis on 8 graph invariants (spectral gap, decay rate, eigenvalue ratio, Fiedler entropy, gradient entropy, bipartiteness, concentration, Betti number). PC1 (45%) is encoding/script; PC2 (26%) is graph topology — a genuinely new typological dimension independent of script. Colors indicate script family.

PCA scatter plot of 49 languages colored by script

Summary (ranked by D*_chars)

# language code order morph script bpc D*_c D*_b f(3) cycles
1 Chinese zh SVO isolating cjk 2.4 1.2 3.0 0.502 1886
2 Japanese ja SOV agglutinating cjk 2.5 1.5 3.7 0.580 844
3 Hindi hi SOV fusional devanagari 2.5 1.9 4.8 0.638 152
4 Korean ko SOV agglutinating hangul 2.1 2.0 4.2 0.569 721
5 Ukrainian uk SVO fusional cyrillic 1.7 2.2 3.8 0.561 360
6 Russian ru SVO fusional cyrillic 1.7 2.3 3.9 0.570 435
7 Mongolian mn SOV agglutinating cyrillic 1.7 2.3 4.0 0.585 210
8 Greek el SVO fusional other 1.7 2.3 4.0 0.577 382
9 Arabic ar VSO fusional arabic 1.7 2.3 4.1 0.630 140
10 Hebrew he SVO fusional other 1.7 2.4 4.0 0.602 204
11 Bulgarian bg SVO fusional cyrillic 1.7 2.7 4.5 0.593 337
12 Serbian sr free fusional cyrillic 1.5 3.0 4.4 0.582 530
13 Czech cs free fusional latin 1.1 3.3 3.6 0.540 1194
14 Slovak sk free fusional latin 1.1 3.5 3.8 0.548 1006
15 Hungarian hu SOV agglutinating latin 1.1 3.5 3.8 0.549 1122
16 Turkish tr SOV agglutinating latin 1.1 3.8 4.0 0.566 854
17 Polish pl free fusional latin 1.1 3.9 4.0 0.565 867
18 Croatian hr free fusional latin 1.0 3.9 4.0 0.568 873
19 Vietnamese vi SVO isolating latin 1.2 4.0 4.9 0.569 290
20 German de SOV fusional latin 1.0 4.2 4.3 0.580 1111
21 Latvian lv free fusional latin 1.1 4.2 4.5 0.586 807
22 Spanish es SVO fusional latin 1.0 4.2 4.3 0.565 674
23 Lithuanian lt free fusional latin 1.1 4.3 4.6 0.576 713
24 Portuguese pt SVO fusional latin 1.0 4.3 4.4 0.573 765
25 Italian it SVO fusional latin 1.0 4.5 4.5 0.571 858
26 Danish da SVO fusional latin 1.0 4.5 4.6 0.586 881
27 Estonian et SVO agglutinating latin 1.0 4.6 4.7 0.568 918
28 French fr SVO fusional latin 1.0 4.6 4.8 0.597 802
29 Romanian ro SVO fusional latin 1.0 4.6 4.8 0.575 726
30 English en SVO fusional latin 1.0 4.7 4.8 0.588 702
31 Dutch nl SVO fusional latin 1.0 >5 >5 0.601 667
32 Swedish sv SVO fusional latin 1.0 >5 >5 0.614 451
33 Norwegian no SVO fusional latin 1.0 >5 >5 0.593 620
34 Indonesian id SVO agglutinating latin 1.0 >5 >5 0.613 497
35 Malay ms SVO agglutinating latin 1.0 >5 >5 0.610 575
36 Thai th SVO isolating other 2.5 >5 >5 0.644 314
37 Swahili sw SVO agglutinating latin 1.0 >5 >5 0.621 431
38 Bengali bn SOV fusional other 2.5 >5 >5 0.629 213
39 Persian fa SOV fusional arabic 1.8 >5 >5 0.659 136
40 Finnish fi SVO agglutinating latin 1.0 >5 >5 0.590 908
41 Georgian ka SOV agglutinating other 2.3 >5 >5 0.672 186
42 Tamil ta SOV agglutinating other 2.6 >5 >5 0.697 110
43 Telugu te SOV agglutinating other 2.6 >5 >5 0.672 92
44 Burmese my SOV isolating other 2.6 >5 >5 0.669 159
45 Uzbek uz SOV agglutinating latin 1.0 >5 >5 0.599 750
46 Irish ga VSO fusional latin 1.1 >5 >5 0.587 394
47 Welsh cy VSO fusional latin 1.0 >5 >5 0.593 782
48 Tagalog tl VSO agglutinating latin 1.0 >5 >5 0.600 855
49 Latin la free fusional latin 1.0 >5 >5 0.601 664

D* by word order

D* by morphological type