A universal beta function for natural language

Research note
Richard Hoekstra · April 2026

The claim

All 49 measured natural languages share a single beta function β(g) ≈ −0.7g in the small-coupling regime. The beta function is independent of word order (SVO, SOV, VSO, free), morphological type, and script. All languages are asymptotically free: β < 0 at every measured depth, no exceptions.

The running coupling

A byte stream at context depth D induces a de Bruijn graph whose edge field decomposes via the Hodge decomposition into exact (gradient) and harmonic (cycle) components. Define the running coupling:

g(D) = fharm(D) / (1 − fharm(D))

where fharm is the fraction of edge-field energy in cycles. The beta function is:

β(g) = dg / d log D

Main result Across 49 languages, six families, four word orders, three morphological classes: β(g) = −0.668 ± 0.174 in the small-coupling window g ∈ [0.5, 1). Extended to g ∈ [0, 50] the curve is β(g) ≈ −g log g.

The universal curve

Measured on 500 KB of Wikipedia per language, depths D ∈ {1, 2, 3, 4, 5}. Discrete beta function computed from consecutive pairs, binned by gmid:

g rangeMean βStd βn
[0.5, 1.0)−0.6680.17421
[1.0, 2.0)−1.2520.77285
[2.0, 5.0)−3.7180.66941
[5.0, 50.0)−34.7197.50049

Across four orders of magnitude in g, the mean beta scales as β(g) ~ −g log g, with sub-leading corrections below the intra-family spread. The curve is asymptotically free: g → 0 as D → ∞.

Word-order invariance

Small-coupling window g ∈ [0.5, 2.0], split by Greenberg word order:

Word orderMean βStd βn
SVO−1.1320.80052
SOV−1.2440.60328
VSO−1.0160.5738
free−1.0380.75918

All four classes agree within one standard deviation. Chinese (SVO, isolating), Japanese (SOV, agglutinative), Irish (VSO, fusional) and Latin (free, fusional) land on the same beta curve.

Two independent D* definitions agree

Two natural crossover definitions:

Across 30 languages where both land inside the measurement window:

Agreement Spearman ρ = 1.000, mean |D*(g=1) − D*(f=1/2)| = 0.014.

Two noisy spectral measurements agree to the second decimal.

The crossover ladder

Smallest D* (synthetic, morphologically heavy)

ChineseCzechSlovakJapaneseUkrainianHungarianRussianGreek
3.063.653.783.783.803.863.903.96

Largest D* (analytic or polysynthetic outliers)

GeorgianTamilBurmeseTeluguLatinUzbekWelshTagalog
>5>5>5>5>5>5>5>5

English lands at D* = 4.76. The ladder matches the FSI difficulty ranking at Spearman ρ = 0.61.

Interpretation

The beta function is the single number needed to describe how a natural language organises statistical structure across scales. It is negative everywhere: there is a unique UV-relevant operator — the symbol itself — and every language flows toward an IR fixed point where harmonic and exact content balance. The one-loop coefficient is ≈ 0.7, the same for Chinese characters, Japanese kana, Finnish case suffixes and Welsh initial mutations.

Two languages can share a beta curve and differ in everything else. Natural languages are a one-parameter family of solutions of the same flow equation.

Reproducibility

MetricValue
Data500 KB of Wikipedia per language (49 languages)
Scriptrunning_coupling.py (~140 lines)
Runtime6 s on a laptop CPU
Formal supportHodgeGraph.lean (0 sorrys)
python3 running_coupling.py
Full paper (HTML) PDF