brainwires-eval
Evaluation harness for the Brainwires Agent Framework.
Overview
A self-contained framework for writing and running evaluation cases against agents (or anything else). Cases are deterministic where possible; when they're not, ranking metrics measure quality.
Originally an internal-only brainwires-eval module, then folded
into brainwires-agent, re-extracted in 0.11 (Phase 11e) so the
framework's evaluation surface is its own dependency-free crate.
Zero brainwires-* deps internally.
Modules
case—EvaluationCasetraittrial—TrialResult+EvaluationStatssuite—EvaluationSuite+SuiteResult(Monte Carlo runner with Wilson confidence intervals)fixtures— YAML-based fixture cases (tests/fixtures/*.yaml)regression—RegressionSuitefor change-detection runsstability_tests— flakiness detection across repeated runsadversarial— adversarial case generationrecorder— recording trial results to diskfault_report— structured fault reportsranking_metrics—ndcg_at_k,mrr,precision_at_k(with graded relevance support)
Migration from brainwires-agent::eval
# Before
= { = ["eval"] }
# After
= "0.11"
// Before
use ;
// After
use ;
License
MIT OR Apache-2.0