topological-coherence 0.1.2

Toroidal logit bias for LLM hallucination reduction — Tonnetz geometry primitives validated on 4 models (TruthfulQA 817 samples)
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
I ran a minimal validation experiment on a question that’s been bothering me for a while:

Is enforcing global constraints on attention (e.g. doubly-stochastic / Birkhoff projections) sufficient to reduce semantic drift — or does geometry matter?

I compared three 2-layer transformer variants on a synthetic task with controlled semantic drift:
	•	Standard attention
	•	mHC-style doubly-stochastic mixing
	•	Toroidal (topology-constrained) attention

Key observations (single run, small model):
	•	Toroidal attention reduced drift by ~40% vs baseline
	•	Gradients were more stable under local topological constraints
	•	Doubly-stochastic mixing alone led to a coherence variance blow-up

This doesn’t claim better models — only that constraint ≠ structure, and locality matters.

Code + experiment setup here: [GitHub link]
Curious if others have seen similar failure modes with global mixing constraints.