Struct BenchRunner

Source

pub struct BenchRunner { /* private fields */ }

Expand description

Drives Agent<BenchmarkChannel> over a dataset and collects scored results.

Each call to run_dataset creates a fresh agent per scenario (baseline mode: no tools, no MCP). Memory is optionally wired via BenchRunner::with_memory_params and RunOptions::memory_mode.

§Examples

use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["Paris".into()]));
let runner = BenchRunner::new(provider, false);

Implementations§

Source §

impl BenchRunner

Source

pub fn new(provider: AnyProvider, _no_deterministic: bool) -> Self

Create a new runner with the given provider.

The no_deterministic argument is unused at runtime but kept in the public API so the bench command can pass it through for future use (e.g., logging or config). Apply deterministic overrides to provider before calling this if needed.

§Examples

use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let runner = BenchRunner::new(provider, false);

Source

pub fn with_memory_params(self, params: BenchMemoryParams) -> Self

Attach SemanticMemory parameters for memory-on benchmark runs.

When set, a per-scenario SQLite-backed SemanticMemory is constructed inside run_one whenever opts.memory_mode == MemoryMode::On.

§Examples

use std::path::PathBuf;
use zeph_bench::runner::{BenchRunner, BenchMemoryParams};
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let params = BenchMemoryParams {
    data_dir: PathBuf::from("/tmp/bench-data"),
    embedding_model: "nomic-embed-text".into(),
    run_id: "bench-abc".into(),
    dataset: "locomo".into(),
};
let runner = BenchRunner::new(provider, false).with_memory_params(params);

Source

pub async fn run_dataset<L, E>( &self, loader: &L, evaluator: &E, path: &Path, opts: RunOptions, ) -> Result<BenchRun, BenchError>
where L: DatasetLoader, E: Evaluator,

Run all matching scenarios from path through the agent and return a BenchRun.

For each scenario:

Builds a fresh Agent<BenchmarkChannel> with no tools or memory.
Feeds the scenario prompt and collects the agent’s response.
Scores the response with evaluator.
Appends a ScenarioResult and recomputes aggregate statistics.

The returned BenchRun has status = Running until the caller sets it to Completed or Interrupted.

§Errors

Returns BenchError if the dataset cannot be loaded or a scenario run fails.

Source

pub async fn run_dataset_with_env_factory<L, F, X>( &self, loader: &L, env_factory: F, path: &Path, opts: RunOptions, ) -> Result<BenchRun, BenchError>
where L: DatasetLoader, F: Fn(&Scenario) -> Result<(X, ActionTrace), BenchError>, X: ToolExecutor + Send + Sync + 'static,

Run all scenarios from path through a per-scenario env executor and return a BenchRun.

This is the execution path for tool-driven datasets (tau2-bench). For each scenario:

Calls env_factory(scenario) to build a fresh (ToolExecutor, ActionTrace).
Builds a fresh TauBenchEvaluator from the scenario metadata and the trace.
Runs the agent with the env executor and the tool-use system prompt.
Scores the response via the evaluator (reads the populated trace).

§Errors

Returns BenchError if the dataset cannot be loaded, the env factory fails, or TauBenchEvaluator::from_scenario fails (malformed metadata).

Auto Trait Implementations§

§

impl !UnwindSafe for BenchRunner

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T> Instrument for T

Source §

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Source §

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §