brainwires-eval 0.11.0

Evaluation harness for the Brainwires Agent Framework — fixtures, regression suites, stability tests, adversarial cases, ranking metrics (NDCG / MRR / precision@k), recorder.
Documentation

brainwires-eval

Crates.io Documentation License

Evaluation harness for the Brainwires Agent Framework.

Overview

A self-contained framework for writing and running evaluation cases against agents (or anything else). Cases are deterministic where possible; when they're not, ranking metrics measure quality.

Originally an internal-only brainwires-eval module, then folded into brainwires-agent, re-extracted in 0.11 (Phase 11e) so the framework's evaluation surface is its own dependency-free crate. Zero brainwires-* deps internally.

Modules

  • caseEvaluationCase trait
  • trialTrialResult + EvaluationStats
  • suiteEvaluationSuite + SuiteResult (Monte Carlo runner with Wilson confidence intervals)
  • fixtures — YAML-based fixture cases (tests/fixtures/*.yaml)
  • regressionRegressionSuite for change-detection runs
  • stability_tests — flakiness detection across repeated runs
  • adversarial — adversarial case generation
  • recorder — recording trial results to disk
  • fault_report — structured fault reports
  • ranking_metricsndcg_at_k, mrr, precision_at_k (with graded relevance support)

Migration from brainwires-agent::eval

# Before
brainwires-agent = { features = ["eval"] }

# After
brainwires-eval = "0.11"
// Before
use brainwires_agent::eval::{EvaluationCase, TrialResult, ndcg_at_k};

// After
use brainwires_eval::{EvaluationCase, TrialResult, ndcg_at_k};

License

MIT OR Apache-2.0