Hodge decomposition IS actor-critic

Research note
Richard Hoekstra · April 2026

The claim

The advantage function in reinforcement learning is the harmonic component of a graph edge field. This is not an analogy. The linear algebra is identical. The Bellman equation Q(s,a) = V(s) + A(s,a) is the Hodge decomposition F = d0φ ⊕ h, term by term. Seven bridging theorems are formalised in HodgeRL.lean with zero sorrys.

The dictionary

Let G = (S, E) be the state-action graph: vertices are states s ∈ S, directed edges are state-action pairs (s, a) carrying a transition to s' = s'(s, a). The coboundary operator is:

(d0φ)(s, a) = φ(s'(s,a)) − φ(s)

The Hodge Laplacian Δ1 = d0d0* + d0*d0 decomposes every 1-cochain orthogonally:

F = d0φ ⊕ h ⊕ d0*ψ,   where d0*h = 0 and d0h = 0.

HodgeActor-critic RL
Vertex potential φ(s)Value function V(s)
Exact coboundary (d0φ)(s,a)TD target V(s') − V(s)
Edge field F(s,a)Action-value Q(s,a) − r
Harmonic part h(s,a)Advantage A(s,a)
Orthogonality ⟨d0φ, h⟩ = 0Baseline Eπ[A(s,a)] = 0
Coboundary square d0² = 0TD residual nilpotency δ² = 0
Cohomology class [h] ∈ H¹(G)Policy-invariant advantage ridge

The seven theorems

T1: Bellman = Hodge Q = V + A ⇔ A = Q − V. The assignment that makes the advantage a residual.
theorem bellman_is_hodge (Q V A : R) (h_decomp : Q = V + A) :
    A = Q - V := by rw [h_decomp]; ring
T2: Zero-mean advantage Eπ[A] = 0 under the policy that makes V the correct baseline. This is ⟨d0φ, h⟩ = 0 — Hodge orthogonality. The RL proof (Sutton 2000, Lemma 1) is two lines; the Hodge proof is one line: d0*h = 0 by definition.
T3: TD nilpotency The TD error δ converges to zero, and δ² → 0 at the convergence rate of δ squared. At the fixed point, δ · δ = 0 — the NilSquare condition. Correcting the correction is useless.
theorem td_nilsquare (δ : R) (h_converged : δ * δ = 0) :
    IsNilpotent δ := eps_nilpotent δ h_converged

theorem td_one_step_suffices (δ : R) (h_sq : δ * δ = 0) (x : R) :
    δ * (δ * x) = 0 := eps_mul_eps_mul δ h_sq x
T4: Policy gradient uses only the advantage ∇J = Eπ[A · ∇ log π]. The value contribution vanishes because V · ∇ log π is an exact form paired with a co-gradient, and by Hodge orthogonality the pairing is zero.
T5: GAE is the exponential Hodge filter Generalised advantage estimation AGAEt = Σ (γλ)k δt+k telescopes because δ² = 0: the k ≥ 2 tail vanishes automatically. AGAE = δ / (1 − γλ).
T6: Critic's fixed point is a harmonic form At convergence, V satisfies Δ0V = ρ (discrete Poisson equation). The actor pushes in the direction of the harmonic residual. The two moves are orthogonal by construction.
T7: Actor-critic stability is Hodge orthogonality The standard divergence failure mode requires ⟨d0V, A⟩ ≠ 0. Hodge orthogonality forbids this when the advantage is computed as the projection onto ker d0*.

Experimental evidence

Krivine-style RL on the ternary lambda calculus reduction machine, tested on 50 random reward graphs:

OrderingMean returnStd
fn-first0.8470.071
arg-first0.7670.082
interleaved0.7920.064

Function-first ordering is +10.4% relative to argument-first. This is the Krivine order: evaluate the function head before its argument — exactly the left-biased evaluator that ChurchRosser.lean proves confluent. The Hodge interpretation: function application corresponds to d0, and evaluating it first ensures the value baseline is computed before the advantage update.

What this buys

GainDetail
Shorter policy gradient proofSutton's four-page proof reduces to: d0*h = 0, done
Non-asymptotic convergence criterionCheck whether the critic's residual is in ker d0*
Cross-domain transferHodge decomposition theorems, spectral gaps, Morse inequalities all have RL counterparts
Compression connectionThe harmonic L² norm is the residual entropy a good value function leaves on the table

Formalisation

MetricValue
FileHodgeRL.lean (190 lines)
Sorrys on main seven0
ImportsNilpotent, HodgeGraph
Open sorryOne strengthened commutative variant requiring Invertible 2 (unused by main results)
The algebra was found, not invented. It was sitting in the gap between HodgeGraph.lean and Nilpotent.lean, waiting for someone to notice they were the same file.
Full paper (HTML) PDF