Module runner

Expand description

Benchmark runner: drives Agent<BenchmarkChannel> over a dataset and collects results.

BenchRunner is the execution engine for zeph bench run. It is intentionally minimal — baseline mode only (no tools, no memory, no MCP). Each scenario is run in isolation through a fresh BenchmarkChannel and the agent’s raw text response is scored by the supplied Evaluator.

§Usage

use std::path::Path;
use zeph_bench::runner::{BenchRunner, RunOptions};
use zeph_bench::loaders::{GaiaLoader, GaiaEvaluator};
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["1945".into()]));
let runner = BenchRunner::new(provider);
let opts = RunOptions::default();
let run = runner.run_dataset(&GaiaLoader::all_levels(), &GaiaEvaluator, Path::new("/data/gaia.jsonl"), opts).await?;
println!("mean score: {:.4}", run.aggregate.mean_score);

Structs§

BenchMemoryParams: Parameters required to construct a per-scenario SQLite-backed SemanticMemory.
BenchRunner: Drives Agent<BenchmarkChannel> over a dataset and collects scored results.
RunOptions: Options that control which scenarios are executed and whether to resume a prior run.

Enums§

MemoryMode: Controls whether SemanticMemory is wired into the agent during a benchmark run.
ResponseMode: Controls how the runner processes the agent’s raw text response.

Module runner

Module runner Copy item path

§Usage

Structs§

Enums§

Module runner