Skip to main content

Module eval

Module eval 

Source
Available on crate feature eval only.
Expand description

Agent evaluation framework.

Test and validate agent behavior:

Available with feature: eval

Modules§

criteria
Evaluation criteria definitions
error
Error types for the evaluation framework
evaluator
Core evaluator implementation
llm_judge
LLM-based evaluation scoring
optimizer
Prompt optimization engine.
prelude
Prelude for convenient imports
report
Evaluation result reporting
schema
Test file schema definitions
scoring
Scoring implementations for evaluation criteria

Structs§

EvalCase
A single evaluation case (test case)
EvalSet
An eval set references multiple test files
EvaluationConfig
Configuration for the evaluator
EvaluationCriteria
Collection of evaluation criteria
EvaluationReport
Complete evaluation report for a test file or eval set
EvaluationResult
Result for a single test case
Evaluator
The main evaluator struct
Failure
A single failure in evaluation
IntermediateData
Intermediate data during a turn (tool calls, etc.)
LlmJudge
LLM-based judge for semantic evaluation
LlmJudgeConfig
Configuration for the LLM judge
OptimizationResult
Result of a prompt optimization run.
OptimizerConfig
Configuration for the prompt optimization loop.
PromptOptimizer
Iteratively improves an agent’s system instructions using an optimizer LLM and an evaluation set.
ResponseMatchConfig
Configuration for response matching
ResponseScorer
Scorer for response text similarity
Rubric
A single rubric for quality assessment
RubricConfig
Configuration for rubric-based evaluation
RubricEvaluationResult
Result of rubric-based evaluation
RubricScore
Score for a single rubric
SemanticMatchResult
Result of semantic similarity evaluation
SessionInput
Session input configuration
TestFile
A complete test file containing multiple evaluation cases
ToolTrajectoryConfig
Configuration for tool trajectory matching
ToolTrajectoryScorer
Scorer for tool trajectory matching
ToolUse
A tool use (function call)
Turn
A single turn in a conversation

Enums§

EvalError
Errors that can occur during evaluation

Type Aliases§

Result
Result type alias for evaluation operations
TestCaseResult
Result for a single test case (alias for backward compatibility)