Expand description
Logit / output verification (plan #61).
Borrowed from MAX’s
tests/integration/accuracy/verify pattern: every model gets a
parity test that diffs RLX’s output vs a reference (HuggingFace
transformers, ONNX Runtime, hand-fused, …) using cosine
similarity, KL divergence, and absolute tolerance.
Pure data layer — no HF / ORT integration here. Test code calls
compare(out, reference, tolerance) and gets back a structured
report it can assert! against. Hooking this up to specific
reference implementations is per-bench wiring (see burnembed).