Skip to main content

BenchRunner

Struct BenchRunner 

Source
pub struct BenchRunner { /* private fields */ }
Expand description

Drives Agent<BenchmarkChannel> over a dataset and collects scored results.

Each call to run_dataset creates a fresh agent per scenario (baseline mode: no tools, no MCP). Memory is optionally wired via BenchRunner::with_memory_params and RunOptions::memory_mode.

§Examples

use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["Paris".into()]));
let runner = BenchRunner::new(provider, false);

Implementations§

Source§

impl BenchRunner

Source

pub fn new(provider: AnyProvider, _no_deterministic: bool) -> Self

Create a new runner with the given provider.

The no_deterministic argument is unused at runtime but kept in the public API so the bench command can pass it through for future use (e.g., logging or config). Apply deterministic overrides to provider before calling this if needed.

§Examples
use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let runner = BenchRunner::new(provider, false);
Source

pub fn with_memory_params(self, params: BenchMemoryParams) -> Self

Attach SemanticMemory parameters for memory-on benchmark runs.

When set, a per-scenario SQLite-backed SemanticMemory is constructed inside run_one whenever opts.memory_mode == MemoryMode::On.

§Examples
use std::path::PathBuf;
use zeph_bench::runner::{BenchRunner, BenchMemoryParams};
use zeph_llm::{any::AnyProvider, mock::MockProvider};

let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let params = BenchMemoryParams {
    data_dir: PathBuf::from("/tmp/bench-data"),
    embedding_model: "nomic-embed-text".into(),
    run_id: "bench-abc".into(),
    dataset: "locomo".into(),
};
let runner = BenchRunner::new(provider, false).with_memory_params(params);
Source

pub async fn run_dataset<L, E>( &self, loader: &L, evaluator: &E, path: &Path, opts: RunOptions, ) -> Result<BenchRun, BenchError>
where L: DatasetLoader, E: Evaluator,

Run all matching scenarios from path through the agent and return a BenchRun.

For each scenario:

  1. Builds a fresh Agent<BenchmarkChannel> with no tools or memory.
  2. Feeds the scenario prompt and collects the agent’s response.
  3. Scores the response with evaluator.
  4. Appends a ScenarioResult and recomputes aggregate statistics.

The returned BenchRun has status = Running until the caller sets it to Completed or Interrupted.

§Errors

Returns BenchError if the dataset cannot be loaded or a scenario run fails.

Source

pub async fn run_dataset_with_env_factory<L, F, X>( &self, loader: &L, env_factory: F, path: &Path, opts: RunOptions, ) -> Result<BenchRun, BenchError>
where L: DatasetLoader, F: Fn(&Scenario) -> Result<(X, ActionTrace), BenchError>, X: ToolExecutor + Send + Sync + 'static,

Run all scenarios from path through a per-scenario env executor and return a BenchRun.

This is the execution path for tool-driven datasets (tau2-bench). For each scenario:

  1. Calls env_factory(scenario) to build a fresh (ToolExecutor, ActionTrace).
  2. Builds a fresh TauBenchEvaluator from the scenario metadata and the trace.
  3. Runs the agent with the env executor and the tool-use system prompt.
  4. Scores the response via the evaluator (reads the populated trace).
§Errors

Returns BenchError if the dataset cannot be loaded, the env factory fails, or TauBenchEvaluator::from_scenario fails (malformed metadata).

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more