Crate adk_eval

Crate adk_eval 

Source
Expand description

§adk-eval

Agent evaluation framework for ADK-Rust.

This crate provides comprehensive tools for testing and validating agent behavior, enabling developers to ensure their agents perform correctly and consistently.

§Features

  • Test Definitions: Structured format for defining test cases (.test.json)
  • Trajectory Evaluation: Validate tool call sequences
  • Response Quality: Assess final output quality with multiple metrics
  • Multiple Criteria: Ground truth, rubric-based, and LLM-judged evaluation
  • Automation: Run evaluations programmatically or via CLI

§Quick Start

use adk_eval::{Evaluator, EvaluationConfig, EvaluationCriteria};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create your agent
    let agent = create_my_agent()?;

    // Configure evaluator
    let config = EvaluationConfig {
        criteria: EvaluationCriteria {
            tool_trajectory_score: Some(1.0),  // Exact tool match
            response_similarity: Some(0.8),    // 80% text similarity
            ..Default::default()
        },
        ..Default::default()
    };

    let evaluator = Evaluator::new(config);

    // Run evaluation
    let result = evaluator
        .evaluate_file(agent, "tests/my_agent.test.json")
        .await?;

    assert!(result.passed, "Evaluation failed: {:?}", result.failures);
    Ok(())
}

Re-exports§

pub use criteria::EvaluationCriteria;
pub use criteria::ResponseMatchConfig;
pub use criteria::Rubric;
pub use criteria::RubricConfig;
pub use criteria::ToolTrajectoryConfig;
pub use error::EvalError;
pub use error::Result;
pub use evaluator::EvaluationConfig;
pub use evaluator::Evaluator;
pub use llm_judge::LlmJudge;
pub use llm_judge::LlmJudgeConfig;
pub use llm_judge::RubricEvaluationResult;
pub use llm_judge::RubricScore;
pub use llm_judge::SemanticMatchResult;
pub use report::EvaluationReport;
pub use report::EvaluationResult;
pub use report::Failure;
pub use report::TestCaseResult;
pub use schema::EvalCase;
pub use schema::EvalSet;
pub use schema::IntermediateData;
pub use schema::SessionInput;
pub use schema::TestFile;
pub use schema::ToolUse;
pub use schema::Turn;
pub use scoring::ResponseScorer;
pub use scoring::ToolTrajectoryScorer;

Modules§

criteria
Evaluation criteria definitions
error
Error types for the evaluation framework
evaluator
Core evaluator implementation
llm_judge
LLM-based evaluation scoring
prelude
Prelude for convenient imports
report
Evaluation result reporting
schema
Test file schema definitions
scoring
Scoring implementations for evaluation criteria