Skip to main content

Module eval

Module eval

Expand description

The checked-in restraint eval set (pure). Makes “the formatter never changes meaning” a FALSIFIABLE gate, not a vibe: each fixture lists impermissible substrings (meaning-changing edits) that must never appear in a formatter’s output. score is the fraction of fixtures with zero impermissible edits. A deliberately over-editing mock MUST score red (see tests).

Structs§

Fixture: One eval case: a phrase + substrings a faithful cleanup must never introduce. Fixtures are written already pre-processed (no spoken commands) so they isolate the formatter’s restraint, not the deterministic pre-layer.

Constants§

FIXTURES: The vendored fixtures: a phrase paired with the meaning-flips a careless rewrite tends to introduce (sentiment swaps, dropped negations).

Functions§

score: Fraction of fixtures the formatter cleans without an impermissible edit (1.0 = perfect restraint). Scores the formatter’s DIRECT output (unguarded), so the metric measures the model, not the moat.