Expand description
The checked-in restraint eval set (pure). Makes “the formatter never changes
meaning” a FALSIFIABLE gate, not a vibe: each fixture lists impermissible
substrings (meaning-changing edits) that must never appear in a formatter’s
output. score is the fraction of fixtures with zero impermissible edits. A
deliberately over-editing mock MUST score red (see tests).
Structs§
- Fixture
- One eval case: a phrase + substrings a faithful cleanup must never introduce. Fixtures are written already pre-processed (no spoken commands) so they isolate the formatter’s restraint, not the deterministic pre-layer.
Constants§
- FIXTURES
- The vendored fixtures: a phrase paired with the meaning-flips a careless rewrite tends to introduce (sentiment swaps, dropped negations).
Functions§
- score
- Fraction of fixtures the formatter cleans without an impermissible edit (1.0 = perfect restraint). Scores the formatter’s DIRECT output (unguarded), so the metric measures the model, not the moat.