Expand description
§adk-eval
Agent evaluation framework for ADK-Rust.
This crate provides comprehensive tools for testing and validating agent behavior, enabling developers to ensure their agents perform correctly and consistently.
§Features
- Test Definitions: Structured format for defining test cases (
.test.json) - Trajectory Evaluation: Validate tool call sequences
- Response Quality: Assess final output quality with multiple metrics
- Multiple Criteria: Ground truth, rubric-based, and LLM-judged evaluation
- Automation: Run evaluations programmatically or via CLI
§Quick Start
ⓘ
use adk_eval::{Evaluator, EvaluationConfig, EvaluationCriteria};
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create your agent
let agent = create_my_agent()?;
// Configure evaluator
let config = EvaluationConfig {
criteria: EvaluationCriteria {
tool_trajectory_score: Some(1.0), // Exact tool match
response_similarity: Some(0.8), // 80% text similarity
..Default::default()
},
..Default::default()
};
let evaluator = Evaluator::new(config);
// Run evaluation
let result = evaluator
.evaluate_file(agent, "tests/my_agent.test.json")
.await?;
assert!(result.passed, "Evaluation failed: {:?}", result.failures);
Ok(())
}Re-exports§
pub use criteria::EvaluationCriteria;pub use criteria::ResponseMatchConfig;pub use criteria::Rubric;pub use criteria::RubricConfig;pub use criteria::ToolTrajectoryConfig;pub use error::EvalError;pub use error::Result;pub use evaluator::EvaluationConfig;pub use evaluator::Evaluator;pub use llm_judge::LlmJudge;pub use llm_judge::LlmJudgeConfig;pub use llm_judge::RubricEvaluationResult;pub use llm_judge::RubricScore;pub use llm_judge::SemanticMatchResult;pub use report::EvaluationReport;pub use report::EvaluationResult;pub use report::Failure;pub use report::TestCaseResult;pub use schema::EvalCase;pub use schema::EvalSet;pub use schema::IntermediateData;pub use schema::SessionInput;pub use schema::TestFile;pub use schema::ToolUse;pub use schema::Turn;pub use scoring::ResponseScorer;pub use scoring::ToolTrajectoryScorer;
Modules§
- criteria
- Evaluation criteria definitions
- error
- Error types for the evaluation framework
- evaluator
- Core evaluator implementation
- llm_
judge - LLM-based evaluation scoring
- prelude
- Prelude for convenient imports
- report
- Evaluation result reporting
- schema
- Test file schema definitions
- scoring
- Scoring implementations for evaluation criteria