The Hodge-Epsilon Program

Irreversibility, complexity, and self-measurement
Richard Hoekstra · April 2026

The loop

Four papers form a loop: empirical measurement of byte streams reveals algebraic structure, algebraic structure is formalized and proved, proof terms are measured for complexity, and the complexity measurement validates the empirical starting point. The connecting thread is ε² = 0.

Byte stream (empirical) → Hodge decomposition (D*, f(D)) [Paper 1] → Architecture (exact + harmonic + residual) [Paper 4] → Algebraic theory (ε² = 0) [Paper 3] → Proof terms (Lean.Expr) [Paper 3] → Type erasure (Lean.Expr → TLC) [Paper 2] → DAG-CSE (K = 5) [Paper 2] → The instrument measures itself (K = 83) [Paper 2]

One axiom in three roles

Algebraic foundation (Paper 3). The axiom forces a 17-layer chain of theorems from elementary ring theory through homological algebra to operadic Koszul duality, all machine-checked in Lean 4.

Information-theoretic optimum (Paper 2). Among all axiom systems of the form x² = c, the dual-number axiom has K = 5 — the minimum possible. A strict gap separates K = 5 from K ≥ 7 for anything more complex. Order 2 is not a convention; it is the K-optimal algebraic structure.

Compression structure (Paper 1). The PPM-C escape mechanism has nilpotent structure: applying escape twice yields nothing. The compressor achieves 2.16 bpb on enwik8, validated by a machine-verified roundtrip proof.

The four papers

#TitleCore result
1Irreversibility depthD* = 7.5, 49-language atlas
2Kolmogorov complexityK = 5 exact, verified K-bounds
3The ε²=0 tower175 files, 17+ layers
4Hodge language model2.26 bpb, 1.75× harmonic efficiency

Key numbers

QuantityValuePaper
D* (English Wikipedia)7.5 bytes1
Languages measured491
K(ε² = 0)5 (exact)2
K(dual_mul)1,018 (292× compressed)2
K(measurement instrument)832
Total reflection cost136 nodes2
Lean files in tower1753
Lines of Lean28,000+3
Best bpb (trie+MLP)2.264
Harmonic efficiency1.75×4
PPM-C (verified)2.16 bpb1, 2

What each paper adds to the others

Paper 1 → Paper 4: D* provides the operating point. The model optimizes at D = 4, where the coupling ratio g(D) crosses unity.

Paper 4 → Paper 1: The Hodge LM provides a constructive proof that D* is compression-relevant. The 1.75× harmonic efficiency shows the irreversible component is learnable, not a measurement artifact.

Paper 2 → Paper 3: K = 5 gives the tower a terminal measurement. The MDL axiom selection reveals that the "fundamental" theorem differs from the traditional mathematical answer.

Paper 3 → Paper 2: The tower provides the object to measure. Its structural diversity (K from 5 to 1,018, compression from 1× to 292×) exercises the full range of the measurement pipeline.

Paper 1 → Paper 2: The compressor that validates D* is the same compressor formalized in Lean to produce the first machine-verified K-bound.

The self-measurement chain

The chain terminates: the measurement instrument (K = 83) measures itself, and meta-measurement is smaller than measurement (K = 23 at Level 2). The tower stabilizes at Level 2–3 with total reflection cost 136 nodes.

No individual paper spans the full loop from byte streams through algebra to self-measurement. But each paper's results become sharper in the context of the others: D* is not just a number but the empirical manifestation of an algebraic boundary; K = 5 is not just an information-theoretic curiosity but the complexity of the axiom that forces 17 layers of mathematics; the Hodge LM is not just an architecture but a constructive proof that the algebraic decomposition has parameter-efficiency consequences.