Skip to main content

EvalRunner

heartbit_core::eval

Struct EvalRunner

pub struct EvalRunner { /* private fields */ }

Expand description

Runs evaluation cases against an agent and collects scored results.

The runner wires an OnEvent callback to capture the tool call trajectory, then scores each case using the configured scorers.

Implementations§

impl EvalRunner

pub fn new() -> Self

Create a new eval runner with no scorers.

Add scorers with scorer; each runs against every case’s actual output to produce a ScorerResult.

§Example

use heartbit_core::eval::{EvalCase, EvalRunner, KeywordScorer};

let runner = EvalRunner::new().scorer(KeywordScorer);
let case = EvalCase::new("capital", "What is the capital of France?")
    .expect_output_contains("Paris");
// No real LLM call here — score the "actual output" directly.
let result = runner.score_result(&case, "The capital of France is Paris.", &[], None);
assert!(result.passed);

pub fn scorer(self, scorer: impl EvalScorer + 'static) -> Self

Add a scorer to the runner.

pub fn with_event_collector(self, collector: EventCollector) -> Self

Attach an event collector that EvalRunner::run will clear before each case. This is required when running 2+ cases with event-aware scorers (CostScorer, LatencyScorer, SafetyScorer) against the same collector — without it, events accumulate across cases and make per-case budgets incorrect from the second case onward.

Pass the same collector you wired into the agent via EvalRunner::event_callback / build_eval_agent.

pub async fn run<P: LlmProvider>( &self, agent: &AgentRunner<P>, cases: &[EvalCase], ) -> Vec<EvalResult>

Run all eval cases against an agent, returning results.

Each case runs the agent independently (fresh execution per case). When an event collector is attached via EvalRunner::with_event_collector, it is cleared before each case so event-aware scorers see only the events generated by that case.

Limitation: This method cannot capture tool call trajectory data because the agent’s OnEvent callback is set at build time. For trajectory scoring, build the agent with build_eval_agent and use score_result with the collected events.

pub fn score_result( &self, case: &EvalCase, output: &str, tool_calls: &[String], error: Option<String>, ) -> EvalResult

Score a case result with pre-collected tool calls.

Use this when you have tool call data from an external source (e.g., OnEvent callback, audit trail, or manual testing).

pub fn event_collector() -> EventCollector

Create an event collector callback for capturing tool call trajectory.

Wire this into AgentRunnerBuilder::on_event() before building the agent. After execution, call collected_tool_calls() on the returned vec.

pub fn event_callback( collector: &EventCollector, ) -> Arc<dyn Fn(AgentEvent) + Send + Sync> ⓘ

Build an OnEvent callback that pushes events into the collector.

pub fn collected_tool_calls(collector: &EventCollector) -> Vec<String>

Extract tool call names from a collected event vec.

Trait Implementations§

impl Debug for EvalRunner

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl Default for EvalRunner

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

impl Freeze for EvalRunner

impl !RefUnwindSafe for EvalRunner

impl Send for EvalRunner

impl Sync for EvalRunner

impl Unpin for EvalRunner

impl UnsafeUnpin for EvalRunner

impl !UnwindSafe for EvalRunner

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more

impl<T> Same for T

type Output = T

Should always be Self

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more