Papers · Atlas · Data

The Hodge-Epsilon Program

Irreversibility, complexity, and self-measurement

Richard Hoekstra · April 2026

The loop

Four papers form a loop: empirical measurement of byte streams reveals algebraic structure, algebraic structure is formalized and proved, proof terms are measured for complexity, and the complexity measurement validates the empirical starting point. The connecting thread is ε² = 0.

Byte stream (empirical) → Hodge decomposition (D*, f(D)) [Paper 1] → Architecture (exact + harmonic + residual) [Paper 4] → Algebraic theory (ε² = 0) [Paper 3] → Proof terms (Lean.Expr) [Paper 3] → Type erasure (Lean.Expr → TLC) [Paper 2] → DAG-CSE (K = 5) [Paper 2] → The instrument measures itself (K = 83) [Paper 2]

One axiom in three roles

Algebraic foundation (Paper 3). The axiom forces a 17-layer chain of theorems from elementary ring theory through homological algebra to operadic Koszul duality, all machine-checked in Lean 4.

Information-theoretic optimum (Paper 2). Among all axiom systems of the form x² = c, the dual-number axiom has K = 5 — the minimum possible. A strict gap separates K = 5 from K ≥ 7 for anything more complex. Order 2 is not a convention; it is the K-optimal algebraic structure.

Compression structure (Paper 1). The PPM-C escape mechanism has nilpotent structure: applying escape twice yields nothing. The compressor achieves 2.16 bpb on enwik8, validated by a machine-verified roundtrip proof.

The four papers

#	Title	Core result
1	Irreversibility depth	D* = 7.5, 49-language atlas
2	Kolmogorov complexity	K = 5 exact, verified K-bounds
3	The ε²=0 tower	175 files, 17+ layers
4	Hodge language model	2.26 bpb, 1.75× harmonic efficiency

Key numbers

Quantity	Value	Paper
D* (English Wikipedia)	7.5 bytes	1
Languages measured	49	1
K(ε² = 0)	5 (exact)	2
K(dual_mul)	1,018 (292× compressed)	2
K(measurement instrument)	83	2
Total reflection cost	136 nodes	2
Lean files in tower	175	3
Lines of Lean	28,000+	3
Best bpb (trie+MLP)	2.26	4
Harmonic efficiency	1.75×	4
PPM-C (verified)	2.16 bpb	1, 2

What each paper adds to the others

Paper 1 → Paper 4: D* provides the operating point. The model optimizes at D = 4, where the coupling ratio g(D) crosses unity.

Paper 4 → Paper 1: The Hodge LM provides a constructive proof that D* is compression-relevant. The 1.75× harmonic efficiency shows the irreversible component is learnable, not a measurement artifact.

Paper 2 → Paper 3: K = 5 gives the tower a terminal measurement. The MDL axiom selection reveals that the "fundamental" theorem differs from the traditional mathematical answer.

Paper 3 → Paper 2: The tower provides the object to measure. Its structural diversity (K from 5 to 1,018, compression from 1× to 292×) exercises the full range of the measurement pipeline.

Paper 1 → Paper 2: The compressor that validates D* is the same compressor formalized in Lean to produce the first machine-verified K-bound.

The self-measurement chain

The chain terminates: the measurement instrument (K = 83) measures itself, and meta-measurement is smaller than measurement (K = 23 at Level 2). The tower stabilizes at Level 2–3 with total reflection cost 136 nodes.

No individual paper spans the full loop from byte streams through algebra to self-measurement. But each paper's results become sharper in the context of the others: D* is not just a number but the empirical manifestation of an algebraic boundary; K = 5 is not just an information-theoretic curiosity but the complexity of the axiom that forces 17 layers of mathematics; the Hodge LM is not just an architecture but a constructive proof that the algebraic decomposition has parameter-efficiency consequences.