Skip to main content

Module eval

Module eval 

Source
Expand description

Agent evaluation framework.

Provides tools for measuring agent quality through repeatable test cases. Inspired by Google ADK’s built-in eval: tool trajectory comparison, response quality scoring, and composable scorers.

§Quick start

use heartbit::eval::{EvalCase, EvalRunner, KeywordScorer, TrajectoryScorer};

let cases = vec![
    EvalCase::new("greeting", "Say hello")
        .expect_output_contains("hello")
        .expect_no_tools(),
    EvalCase::new("file-read", "Read /tmp/test.txt")
        .expect_tool("read_file")
        .expect_output_contains("content"),
];

let runner = EvalRunner::new()
    .scorer(TrajectoryScorer)
    .scorer(KeywordScorer);

let results = runner.run(&agent, &cases).await;
let summary = EvalSummary::from_results(&results);
println!("{summary}");

Structs§

CaseComparison
Comparison of a single case between baseline and candidate runs.
CostScorer
Scores agent execution against a cost budget.
EvalCase
A single evaluation test case.
EvalComparison
Comparison of two eval runs for A/B testing and regression detection.
EvalResult
Result of evaluating a single test case.
EvalRunner
Runs evaluation cases against an agent and collects scored results.
EvalSummary
Aggregate summary of multiple eval results.
ExpectedToolCall
An expected tool call in a trajectory.
KeywordScorer
Scores output against expected keyword presence/absence.
LatencyScorer
Scores agent execution against a latency budget.
SafetyScorer
Scores agent execution for guardrail safety.
ScorerResult
Result from a single scorer.
SimilarityScorer
Scores output similarity to a reference using unigram overlap (Rouge-1 F1).
ToolCallCountScorer
Scores agent execution against a tool call count budget.
TrajectoryScorer
Scores tool call trajectory against expected tool calls.

Constants§

KNOWN_SCORERS
All known scorer names supported by the eval framework.

Traits§

EvalScorer
Pluggable scoring function for evaluation.

Functions§

build_eval_agent
Build an eval-ready agent with event collection.
clear_events
Clear all events from a collector.

Type Aliases§

EventCollector
Shared event collector for eval tool call trajectory capture.