pub fn evaluate_batch( baseline_outputs: &[Vec<f32>], gated_outputs: &[Vec<f32>], threshold: f64, ) -> BatchResult
Evaluates a batch of output pairs, producing mean/std/CI for coherence delta and pass rate.