pub struct BenchRunner { /* private fields */ }Expand description
Drives Agent<BenchmarkChannel> over a dataset and collects scored results.
Each call to run_dataset creates a fresh agent per
scenario (baseline mode: no tools, no MCP). Memory is optionally wired via
BenchRunner::with_memory_params and RunOptions::memory_mode.
§Examples
use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};
let provider = AnyProvider::Mock(MockProvider::with_responses(vec!["Paris".into()]));
let runner = BenchRunner::new(provider, false);Implementations§
Source§impl BenchRunner
impl BenchRunner
Sourcepub fn new(provider: AnyProvider, _no_deterministic: bool) -> Self
pub fn new(provider: AnyProvider, _no_deterministic: bool) -> Self
Create a new runner with the given provider.
The no_deterministic argument is unused at runtime but kept in the public API
so the bench command can pass it through for future use (e.g., logging or config).
Apply deterministic overrides to provider before calling this if needed.
§Examples
use zeph_bench::runner::BenchRunner;
use zeph_llm::{any::AnyProvider, mock::MockProvider};
let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let runner = BenchRunner::new(provider, false);Sourcepub fn with_memory_params(self, params: BenchMemoryParams) -> Self
pub fn with_memory_params(self, params: BenchMemoryParams) -> Self
Attach SemanticMemory parameters for memory-on benchmark runs.
When set, a per-scenario SQLite-backed SemanticMemory is constructed inside
run_one whenever opts.memory_mode == MemoryMode::On.
§Examples
use std::path::PathBuf;
use zeph_bench::runner::{BenchRunner, BenchMemoryParams};
use zeph_llm::{any::AnyProvider, mock::MockProvider};
let provider = AnyProvider::Mock(MockProvider::with_responses(vec![]));
let params = BenchMemoryParams {
data_dir: PathBuf::from("/tmp/bench-data"),
embedding_model: "nomic-embed-text".into(),
run_id: "bench-abc".into(),
dataset: "locomo".into(),
};
let runner = BenchRunner::new(provider, false).with_memory_params(params);Sourcepub async fn run_dataset<L, E>(
&self,
loader: &L,
evaluator: &E,
path: &Path,
opts: RunOptions,
) -> Result<BenchRun, BenchError>where
L: DatasetLoader,
E: Evaluator,
pub async fn run_dataset<L, E>(
&self,
loader: &L,
evaluator: &E,
path: &Path,
opts: RunOptions,
) -> Result<BenchRun, BenchError>where
L: DatasetLoader,
E: Evaluator,
Run all matching scenarios from path through the agent and return a BenchRun.
For each scenario:
- Builds a fresh
Agent<BenchmarkChannel>with no tools or memory. - Feeds the scenario prompt and collects the agent’s response.
- Scores the response with
evaluator. - Appends a
ScenarioResultand recomputes aggregate statistics.
The returned BenchRun has status = Running until the caller sets it to
Completed or Interrupted.
§Errors
Returns BenchError if the dataset cannot be loaded or a scenario run fails.
Sourcepub async fn run_dataset_with_env_factory<L, F, X>(
&self,
loader: &L,
env_factory: F,
path: &Path,
opts: RunOptions,
) -> Result<BenchRun, BenchError>where
L: DatasetLoader,
F: Fn(&Scenario) -> Result<(X, ActionTrace), BenchError>,
X: ToolExecutor + Send + Sync + 'static,
pub async fn run_dataset_with_env_factory<L, F, X>(
&self,
loader: &L,
env_factory: F,
path: &Path,
opts: RunOptions,
) -> Result<BenchRun, BenchError>where
L: DatasetLoader,
F: Fn(&Scenario) -> Result<(X, ActionTrace), BenchError>,
X: ToolExecutor + Send + Sync + 'static,
Run all scenarios from path through a per-scenario env executor and return a BenchRun.
This is the execution path for tool-driven datasets (tau2-bench). For each scenario:
- Calls
env_factory(scenario)to build a fresh(ToolExecutor, ActionTrace). - Builds a fresh
TauBenchEvaluatorfrom the scenario metadata and the trace. - Runs the agent with the env executor and the tool-use system prompt.
- Scores the response via the evaluator (reads the populated trace).
§Errors
Returns BenchError if the dataset cannot be loaded, the env factory fails, or
TauBenchEvaluator::from_scenario fails (malformed metadata).
Auto Trait Implementations§
impl !Freeze for BenchRunner
impl !RefUnwindSafe for BenchRunner
impl Send for BenchRunner
impl Sync for BenchRunner
impl Unpin for BenchRunner
impl UnsafeUnpin for BenchRunner
impl !UnwindSafe for BenchRunner
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request