Available on crate feature
eval only.Expand description
Agent evaluation framework.
Test and validate agent behavior:
Evaluator- Run evaluation suitesEvaluationConfig- Configure evaluation parameters
Available with feature: eval
Modules§
- criteria
- Evaluation criteria definitions
- error
- Error types for the evaluation framework
- evaluator
- Core evaluator implementation
- llm_
judge - LLM-based evaluation scoring
- prelude
- Prelude for convenient imports
- report
- Evaluation result reporting
- schema
- Test file schema definitions
- scoring
- Scoring implementations for evaluation criteria
Structs§
- Eval
Case - A single evaluation case (test case)
- EvalSet
- An eval set references multiple test files
- Evaluation
Config - Configuration for the evaluator
- Evaluation
Criteria - Collection of evaluation criteria
- Evaluation
Report - Complete evaluation report for a test file or eval set
- Evaluation
Result - Result for a single test case
- Evaluator
- The main evaluator struct
- Failure
- A single failure in evaluation
- Intermediate
Data - Intermediate data during a turn (tool calls, etc.)
- LlmJudge
- LLM-based judge for semantic evaluation
- LlmJudge
Config - Configuration for the LLM judge
- Response
Match Config - Configuration for response matching
- Response
Scorer - Scorer for response text similarity
- Rubric
- A single rubric for quality assessment
- Rubric
Config - Configuration for rubric-based evaluation
- Rubric
Evaluation Result - Result of rubric-based evaluation
- Rubric
Score - Score for a single rubric
- Semantic
Match Result - Result of semantic similarity evaluation
- Session
Input - Session input configuration
- Test
File - A complete test file containing multiple evaluation cases
- Tool
Trajectory Config - Configuration for tool trajectory matching
- Tool
Trajectory Scorer - Scorer for tool trajectory matching
- ToolUse
- A tool use (function call)
- Turn
- A single turn in a conversation
Enums§
- Eval
Error - Errors that can occur during evaluation
Type Aliases§
- Result
- Result type alias for evaluation operations
- Test
Case Result - Result for a single test case (alias for backward compatibility)