Expand description
Benchmark runner: drives Agent<BenchmarkChannel> over a dataset and collects results.
BenchRunner is the execution engine for zeph bench run. It is intentionally
minimal — baseline mode only (no tools, no memory, no MCP). Each scenario is run in
isolation through a fresh BenchmarkChannel and the agent’s raw text response is
scored by the supplied Evaluator.
§Usage
use std::path::Path;
use zeph_bench::runner::{BenchRunner, RunOptions};
use zeph_bench::loaders::{GaiaLoader, GaiaEvaluator};
use zeph_llm::{any::AnyProvider, mock::MockProvider};
let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["1945".into()]));
let runner = BenchRunner::new(provider);
let opts = RunOptions::default();
let run = runner.run_dataset(&GaiaLoader::all_levels(), &GaiaEvaluator, Path::new("/data/gaia.jsonl"), opts).await?;
println!("mean score: {:.4}", run.aggregate.mean_score);Structs§
- Bench
Memory Params - Parameters required to construct a per-scenario
SQLite-backedSemanticMemory. - Bench
Runner - Drives
Agent<BenchmarkChannel>over a dataset and collects scored results. - RunOptions
- Options that control which scenarios are executed and whether to resume a prior run.
Enums§
- Memory
Mode - Controls whether
SemanticMemoryis wired into the agent during a benchmark run. - Response
Mode - Controls how the runner processes the agent’s raw text response.