Skip to main content

EvalRunner

Struct EvalRunner 

Source
pub struct EvalRunner { /* private fields */ }
Expand description

Runs evaluation cases against an agent and collects scored results.

The runner wires an OnEvent callback to capture the tool call trajectory, then scores each case using the configured scorers.

Implementations§

Source§

impl EvalRunner

Source

pub fn new() -> Self

Create a new eval runner with no scorers.

Add scorers with scorer; each runs against every case’s actual output to produce a ScorerResult.

§Example
use heartbit_core::eval::{EvalCase, EvalRunner, KeywordScorer};

let runner = EvalRunner::new().scorer(KeywordScorer);
let case = EvalCase::new("capital", "What is the capital of France?")
    .expect_output_contains("Paris");
// No real LLM call here — score the "actual output" directly.
let result = runner.score_result(&case, "The capital of France is Paris.", &[], None);
assert!(result.passed);
Source

pub fn scorer(self, scorer: impl EvalScorer + 'static) -> Self

Add a scorer to the runner.

Source

pub fn with_event_collector(self, collector: EventCollector) -> Self

Attach an event collector that EvalRunner::run will clear before each case. This is required when running 2+ cases with event-aware scorers (CostScorer, LatencyScorer, SafetyScorer) against the same collector — without it, events accumulate across cases and make per-case budgets incorrect from the second case onward.

Pass the same collector you wired into the agent via EvalRunner::event_callback / build_eval_agent.

Source

pub async fn run<P: LlmProvider>( &self, agent: &AgentRunner<P>, cases: &[EvalCase], ) -> Vec<EvalResult>

Run all eval cases against an agent, returning results.

Each case runs the agent independently (fresh execution per case). When an event collector is attached via EvalRunner::with_event_collector, it is cleared before each case so event-aware scorers see only the events generated by that case.

Limitation: This method cannot capture tool call trajectory data because the agent’s OnEvent callback is set at build time. For trajectory scoring, build the agent with build_eval_agent and use score_result with the collected events.

Source

pub fn score_result( &self, case: &EvalCase, output: &str, tool_calls: &[String], error: Option<String>, ) -> EvalResult

Score a case result with pre-collected tool calls.

Use this when you have tool call data from an external source (e.g., OnEvent callback, audit trail, or manual testing).

Source

pub fn event_collector() -> EventCollector

Create an event collector callback for capturing tool call trajectory.

Wire this into AgentRunnerBuilder::on_event() before building the agent. After execution, call collected_tool_calls() on the returned vec.

Source

pub fn event_callback( collector: &EventCollector, ) -> Arc<dyn Fn(AgentEvent) + Send + Sync>

Build an OnEvent callback that pushes events into the collector.

Source

pub fn collected_tool_calls(collector: &EventCollector) -> Vec<String>

Extract tool call names from a collected event vec.

Trait Implementations§

Source§

impl Debug for EvalRunner

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for EvalRunner

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more