# Topological Constraints for Coherent Language Models
**Why Geometry Prevents Hallucination**
[](cormier_topological_coherence_2026.pdf)
[](LICENSE)
---
## Abstract
Residual geometry determines whether reasoning is stable. We show that transformer latent dynamics, operating on unconstrained vector spaces, lack the conserved quantities necessary for bounded inference. This establishes a hierarchy of sufficient conditions:
```
mHC (Birkhoff) ⊂ ERLHS (Hamiltonian) ⊂ Karmonic (Toroidal + Spectral)
```
The practical consequence—reduced drift, and thereby reduced hallucination—follows from the geometry when these conditions are satisfied.
---
## Key Theoretical Contributions
### 1. Hallucination as Geometry Problem
We argue that hallucination is not a training data problem, an alignment failure, or an inherent limitation of autoregressive generation. **Hallucination is a geometry problem**: unconstrained latent dynamics permit arbitrary drift through latent space.
### 2. Hierarchy of Constraints
| **mHC** (Birkhoff polytope) | Bounded mixing | Training stability |
| **ERLHS** (Hamiltonian) | Conserved flow | Inference coherence |
| **Karmonic** (Toroidal + Spectral) | Spectral gap | Noise suppression |
### 3. Spectral Alignment (Resonance)
Modes that align with the manifold's eigenstructure persist under repeated composition. Non-resonant modes decay as e^(-λt).
**Epistemic boundary**: Spectral alignment *filters*, *stabilizes*, and *selects*. It does not alone guarantee semantic correctness. A resonant mode may be stably wrong.
---
## Empirical Results
### Replication Update (March 2026)
A comprehensive independent replication (6 phases, 4 models, 3 benchmarks) found that the inference-time toroidal logit bias **does not produce statistically significant hallucination reduction**. The original v2 results were within LLM judge sampling variance. Full replication data in `experiments/results/`.
**T&I Exact Replication (Qwen 7B, n=200, exact v2 methodology):**
| T&I % | 76.5% | 74.5% | −2.0pp | 0.22 |
Baseline matches v2 (76.5% vs 75.6%), confirming correct methodology. Toroidal shows opposite direction, not significant.
**Alpha Sweep (Qwen 7B, n=100):** Higher alpha monotonically degrades output. α=0.3 has zero effect; α≥5.0 causes catastrophic degradation (75–96% hallucination).
**Active Ingredient:** The hardening system prompt ("Answer concisely and truthfully") produces −14pp hallucination reduction (p=0.05) — this prompt engineering, not the toroidal bias, was the effective component in the Coherence Shield pipeline.
### Original v2 Results (NOT REPLICATED)
| Qwen 0.5B | 16.9% | 17.1% | +0.2pp |
| Qwen 1.5B | 32.2% | 32.8% | +0.6pp |
| Qwen 7B | 75.6% | 77.7% | +2.1pp |
| Mistral 7B | 74.4% | 77.2% | +2.8pp |
### Toy Model Validation (Still Valid)
Training-time toroidal attention masks on a 2-layer transformer:
| Baseline | 0.0100 | Control |
| Toroidal | **0.0060** | **40% lower drift** |
| Random sparse | 0.1673 | 28x worse — proves topology matters, not sparsity |
### Critical Insight: Negative Control
**Random graph masking (same sparsity, no topological structure) has drift rate 0.167 vs toroidal's 0.006.** This proves it's specifically **topological structure** that matters — sparsity alone is insufficient. However, this training-time result does not transfer to inference-time logit biasing.
---
## Repository Structure
```
topological-coherence/
├── src/
│ ├── topological_coherence/ # Python package (PyPI)
│ │ ├── logit_bias.py # ToroidalLogitProcessor
│ │ ├── tonnetz.py # Tonnetz topology
│ │ ├── masks.py # Toroidal mask generation
│ │ ├── attention.py # Attention layer variants
│ │ ├── drift.py # Drift measurement
│ │ └── tests/ # Unit tests
│ └── lib.rs # Rust crate (crates.io)
├── paper/
│ ├── toroidal_hallucination_reduction_2026.tex # v2 paper (multi-model)
│ └── toroidal_hallucination_reduction_2026.pdf
├── cormier_topological_coherence_2026.tex # Theory paper (LaTeX)
├── cormier_topological_coherence_2026.pdf # Theory paper (PDF)
├── results/ # v2 benchmark data & charts
├── experiments/ # Validation scripts
├── diagrams/ # Result visualizations
├── docs/ # Unified theory & diagrams
├── huggingface-space/ # HuggingFace Space demo
├── presentation/ # HTML presentation
├── Cargo.toml # Rust crate config
├── pyproject.toml # Python package config
└── LICENSE # Apache 2.0
```
---
## Running the Experiment
### Prerequisites
- Python 3.8+
- ~500MB disk space for PyTorch
### Installation
```bash
cd experiments
python3 -m venv venv
source venv/bin/activate
pip install torch numpy
```
### Run
```bash
python tonnetz_validation.py
```
**Expected runtime**: ~4 minutes on CPU (no GPU required)
### Expected Output
The experiment trains 4 models (baseline, mHC, toroidal, random) and reports:
- Drift rate (lower = better semantic coherence)
- Coherence variance (hidden state stability)
- Gradient norm (training stability)
---
## Theoretical Background
### Tonnetz Topology
The Tonnetz is a 2D torus where:
- Horizontal edges connect by perfect fifths
- Vertical edges connect by major thirds
- Diagonal edges connect by minor thirds
We use it as a **constructive existence proof** of a low-genus manifold with constant spectral gap—not as a claim about semantic universals.
### Spectral Gap
For a d-dimensional torus T^d_N:
```
λ₁ = 2 - 2cos(2π/N) = Θ(1)
```
for fixed side length N, independent of total nodes N^d.
**Important caveat**: This holds for fixed torus side length N. Scaling N reintroduces gap decay as O(1/N²).
### Why Not Implicit Smoothing?
Standard transformer components (LayerNorm, softmax temperature, multi-head averaging) provide some implicit spectral filtering. However, none impose *topological* constraints—they operate pointwise or via soft weighting, not via manifold structure. They smooth without providing a conserved quantity or spectral gap guarantee.
The distinction is between **ad-hoc regularization** (which helps) and **geometric constraint** (which bounds).
---
## Citation
```bibtex
@misc{cormier2026topological,
author = {Cormier, Sylvain},
title = {Topological Constraints for Coherent Language Models: Why Geometry Prevents Hallucination},
year = {2026},
publisher = {Zenodo},
url = {https://github.com/Paraxiom/topological-coherence}
}
```
---
## Related Work
| **Unified Theory** | Conservative composition across ML, blockchain, consensus | [docs/UNIFIED_THEORY.md](docs/UNIFIED_THEORY.md) |
| ERLHS | Hamiltonian framework for coherence-preserving ML | [DOI: 10.5281/zenodo.17928909](https://doi.org/10.5281/zenodo.17928909) |
| Karmonic Mesh | Spectral consensus on toroidal manifolds | [DOI: 10.5281/zenodo.17928991](https://doi.org/10.5281/zenodo.17928991) |
| mHC | Manifold-Constrained Hyper-Connections | [arXiv:2512.24880](https://arxiv.org/abs/2512.24880) |
| Graph Signal Processing | Spectral methods on graphs | [Shuman et al., 2013](https://ieeexplore.ieee.org/document/6494675) |
---
## Key Equations
### Toroidal Attention Mask (Eq. 17)
```
M_Tonnetz(i, j) = 1 if d_Tonnetz(i, j) ≤ r
exp(-α · d_Tonnetz(i,j)) otherwise
```
### Learned Toroidal Projection (Eq. 20)
```
φ_θ(e) = ( σ(W₁e) mod 1, σ(W₂e) mod 1 )
```
### Adjacency Loss (Eq. 21)
```
L_topo = E[(a,b)~co-occur][d_T(φ(a), φ(b))] - λ · E[(a,c)~random][d_T(φ(a), φ(c))]
```
---
## Limitations
1. **Inference-time logit bias does not replicate**: The v2 TruthfulQA improvements were within LLM judge sampling variance
2. **Hyperparameter sensitivity**: The OLMo +15.4% result came from a 100-configuration sweep (overfitting to test set)
3. **Judge bias**: LLM-judged evaluation uses Qwen-7B as both subject and judge, introducing variance
4. **Bias magnitude**: At practical α (0.1–0.3), the logit shift (~0.9 max) is too small to change argmax under greedy decoding. Under sampling, effects are not statistically significant at n=200.
---
## Future Work
1. **Training-time Karmonic regularization**: The toy model shows topology matters at training time — the untested promising direction
2. Compare with other geometric constraints (hyperbolic, spherical)
3. Orthogonal projection with RAG-expanded evidence basis (question-only basis too restrictive)
4. Learned toroidal mappings (semantic embeddings instead of modular arithmetic)
---
## License
Apache 2.0
---
## Contact
- **Author**: Sylvain Cormier
- **Email**: sylvain@paraxiom.org
- **Organization**: [Paraxiom Research](https://paraxiom.org)
---
*"Geometric constraints provide one principled path to coherent artificial intelligence—not the only path, but a formally grounded one."*