{
"_comment": "WER regression baselines per benchmark set. The benchmark gate fails (non-zero exit) when measured WER exceeds the set's baseline by more than tolerance_pp. Populate or refresh by running the benchmark with the model present and GIGASTT_BENCHMARK_UPDATE_BASELINE=1, then commit this file. 'wer': null means unpopulated, so only the absolute MAX_WER ceiling applies for that set until a real run fills it in. NOTE: the bundled set is only ~75 words, so one differing word is ~1.3pp; tolerance_pp sits above that to absorb cross-environment FP jitter while still catching multi-word regressions. Tighten once more fixtures are committed (or set a baseline for the larger 'external' set).",
"sets": {
"bundled": {
"samples": 15,
"wer": 1.3
},
"external": {
"samples": 9994,
"wer": 2.6
}
},
"tolerance_pp": 1.5
}