topological-coherence 0.1.2

Toroidal logit bias for LLM hallucination reduction — Tonnetz geometry primitives validated on 4 models (TruthfulQA 817 samples)
Documentation

Topological Constraints for Coherent Language Models

Why Geometry Prevents Hallucination

Sylvain Cormier | Paraxiom Research | January 2026

Paper License


Abstract

Residual geometry determines whether reasoning is stable. We show that transformer latent dynamics, operating on unconstrained vector spaces, lack the conserved quantities necessary for bounded inference. This establishes a hierarchy of sufficient conditions:

mHC (Birkhoff) ⊂ ERLHS (Hamiltonian) ⊂ Karmonic (Toroidal + Spectral)

The practical consequence—reduced drift, and thereby reduced hallucination—follows from the geometry when these conditions are satisfied.


Key Theoretical Contributions

1. Hallucination as Geometry Problem

We argue that hallucination is not a training data problem, an alignment failure, or an inherent limitation of autoregressive generation. Hallucination is a geometry problem: unconstrained latent dynamics permit arbitrary drift through latent space.

2. Hierarchy of Constraints

Level Adds Solves
mHC (Birkhoff polytope) Bounded mixing Training stability
ERLHS (Hamiltonian) Conserved flow Inference coherence
Karmonic (Toroidal + Spectral) Spectral gap Noise suppression

3. Spectral Alignment (Resonance)

Modes that align with the manifold's eigenstructure persist under repeated composition. Non-resonant modes decay as e^(-λt).

Epistemic boundary: Spectral alignment filters, stabilizes, and selects. It does not alone guarantee semantic correctness. A resonant mode may be stably wrong.


Empirical Results

TruthfulQA v2 — Multi-Model Benchmark (817 samples, LLM-judged)

Toroidal logit bias produces consistent improvements across all 4 models tested:

Model Baseline T&I Toroidal T&I Delta
Qwen 0.5B 16.9% 17.1% +0.2pp
Qwen 1.5B 32.2% 32.8% +0.6pp
Qwen 7B 75.6% 77.7% +2.1pp
Mistral 7B 74.4% 77.2% +2.8pp

Key finding: Improvement scales with model capacity — larger models benefit more from toroidal constraints.

Toy Model Validation

Training-time toroidal attention masks on a 2-layer transformer:

Condition Drift Rate Interpretation
Baseline 0.0100 Control
Toroidal 0.0060 40% lower drift
Random sparse 0.1673 28x worse — proves topology matters, not sparsity

Critical Insight: Negative Control

Random graph masking (same sparsity, no topological structure) has drift rate 0.167 vs toroidal's 0.006. This proves it's specifically topological structure that matters — sparsity alone is insufficient.


Repository Structure

topological-coherence/
├── cormier_topological_coherence_2026.pdf   # Paper (15 pages)
├── cormier_topological_coherence_2026.tex   # LaTeX source
├── docs/
│   ├── UNIFIED_THEORY.md                    # Cross-domain unified theory
│   └── diagrams/                            # SVG diagrams
├── experiments/
│   ├── tonnetz_validation.py                # Minimal validation experiment
│   └── venv/                                # Python environment (not tracked)
├── src/topological_coherence/               # PyPI package source
├── README.md                                # This file
└── LICENSE                                  # Apache 2.0

Running the Experiment

Prerequisites

  • Python 3.8+
  • ~500MB disk space for PyTorch

Installation

cd experiments
python3 -m venv venv
source venv/bin/activate
pip install torch numpy

Run

python tonnetz_validation.py

Expected runtime: ~4 minutes on CPU (no GPU required)

Expected Output

The experiment trains 4 models (baseline, mHC, toroidal, random) and reports:

  • Drift rate (lower = better semantic coherence)
  • Coherence variance (hidden state stability)
  • Gradient norm (training stability)

Theoretical Background

Tonnetz Topology

The Tonnetz is a 2D torus where:

  • Horizontal edges connect by perfect fifths
  • Vertical edges connect by major thirds
  • Diagonal edges connect by minor thirds

We use it as a constructive existence proof of a low-genus manifold with constant spectral gap—not as a claim about semantic universals.

Spectral Gap

For a d-dimensional torus T^d_N:

λ₁ = 2 - 2cos(2π/N) = Θ(1)

for fixed side length N, independent of total nodes N^d.

Important caveat: This holds for fixed torus side length N. Scaling N reintroduces gap decay as O(1/N²).

Why Not Implicit Smoothing?

Standard transformer components (LayerNorm, softmax temperature, multi-head averaging) provide some implicit spectral filtering. However, none impose topological constraints—they operate pointwise or via soft weighting, not via manifold structure. They smooth without providing a conserved quantity or spectral gap guarantee.

The distinction is between ad-hoc regularization (which helps) and geometric constraint (which bounds).


Citation

@misc{cormier2026topological,
  author = {Cormier, Sylvain},
  title = {Topological Constraints for Coherent Language Models: Why Geometry Prevents Hallucination},
  year = {2026},
  publisher = {Zenodo},
  url = {https://github.com/Paraxiom/topological-coherence}
}

Related Work

Paper Topic Link
Unified Theory Conservative composition across ML, blockchain, consensus docs/UNIFIED_THEORY.md
ERLHS Hamiltonian framework for coherence-preserving ML DOI: 10.5281/zenodo.17928909
Karmonic Mesh Spectral consensus on toroidal manifolds DOI: 10.5281/zenodo.17928991
mHC Manifold-Constrained Hyper-Connections arXiv:2512.24880
Graph Signal Processing Spectral methods on graphs Shuman et al., 2013

Key Equations

Toroidal Attention Mask (Eq. 17)

M_Tonnetz(i, j) = 1                           if d_Tonnetz(i, j) ≤ r
                  exp(-α · d_Tonnetz(i,j))    otherwise

Learned Toroidal Projection (Eq. 20)

φ_θ(e) = ( σ(W₁e) mod 1, σ(W₂e) mod 1 )

Adjacency Loss (Eq. 21)

L_topo = E[(a,b)~co-occur][d_T(φ(a), φ(b))] - λ · E[(a,c)~random][d_T(φ(a), φ(c))]

Limitations

  1. Benchmark scope: Tested on factual truthfulness (TruthfulQA). Open-ended generation untested.
  2. Recall-coherence tradeoff: Suppressing long-range attention may hurt tasks requiring non-local retrieval
  3. Hyperparameter sensitivity: Each model family requires tuning
  4. Judge bias: LLM-judged evaluation uses Qwen-7B as both subject and judge

Future Work

  1. Scale to 70B+ models (scaling trend is encouraging)
  2. Compare with other geometric constraints (hyperbolic, spherical)
  3. Cross-model judging to eliminate judge bias
  4. Investigate task-dependent optimal topology

License

Apache 2.0


Contact


"Geometric constraints provide one principled path to coherent artificial intelligence—not the only path, but a formally grounded one."