Skip to main content

Module runner

Module runner 

Source
Expand description

Benchmark runner: drives Agent<BenchmarkChannel> over a dataset and collects results.

BenchRunner is the execution engine for zeph bench run. It is intentionally minimal — baseline mode only (no tools, no memory, no MCP). Each scenario is run in isolation through a fresh BenchmarkChannel and the agent’s raw text response is scored by the supplied Evaluator.

§Usage

use std::path::Path;
use zeph_bench::runner::{BenchRunner, RunOptions};
use zeph_bench::loaders::{GaiaLoader, GaiaEvaluator};
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["1945".into()]));
let runner = BenchRunner::new(provider);
let opts = RunOptions::default();
let run = runner.run_dataset(&GaiaLoader::all_levels(), &GaiaEvaluator, Path::new("/data/gaia.jsonl"), opts).await?;
println!("mean score: {:.4}", run.aggregate.mean_score);

Structs§

BenchMemoryParams
Parameters required to construct a per-scenario SQLite-backed SemanticMemory.
BenchRunner
Drives Agent<BenchmarkChannel> over a dataset and collects scored results.
RunOptions
Options that control which scenarios are executed and whether to resume a prior run.

Enums§

MemoryMode
Controls whether SemanticMemory is wired into the agent during a benchmark run.
ResponseMode
Controls how the runner processes the agent’s raw text response.