Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
adk-eval
Agent evaluation framework for Rust Agent Development Kit (ADK-Rust).
Overview
adk-eval provides comprehensive tools for testing and validating agent behavior, enabling developers to ensure their agents perform correctly and consistently. Unlike traditional software testing, agent evaluation must account for the probabilistic nature of LLMs while still providing meaningful quality signals.
Features
- Test Definitions: Structured JSON format for defining test cases (
.test.json) - Trajectory Evaluation: Validate tool call sequences with exact or partial matching
- Response Quality: Assess final output quality using multiple algorithms
- LLM-Judged Evaluation: Semantic matching, rubric-based scoring, and safety checks
- Multiple Criteria: Ground truth, similarity-based, and configurable thresholds
- Detailed Reporting: Comprehensive results with failure analysis
Quick Start
use ;
use Arc;
async
Test File Format
Test files use JSON format with the following structure:
Evaluation Criteria
Tool Trajectory Matching
Validates that the agent calls expected tools:
let criteria = EvaluationCriteria ;
Response Similarity
Compare response text using various algorithms:
let criteria = EvaluationCriteria ;
Available similarity algorithms:
Exact- Exact string matchContains- Substring checkLevenshtein- Edit distanceJaccard- Word overlap (default)Rouge1- Unigram overlapRouge2- Bigram overlapRougeL- Longest common subsequence
LLM-Judged Semantic Matching
Use an LLM to judge semantic equivalence between expected and actual responses:
use ;
use GeminiModel;
use Arc;
// Create evaluator with LLM judge
let judge_model = new;
let config = with_criteria;
let evaluator = with_llm_judge;
Rubric-Based Evaluation
Evaluate responses against custom rubrics:
use ;
let criteria = default
.with_rubrics;
Safety and Hallucination Detection
Check responses for safety issues and hallucinations:
let criteria = EvaluationCriteria ;
Result Reporting
let report = evaluator.evaluate_file.await?;
// Summary
println!;
println!;
println!;
println!;
// Detailed failures
for result in report.failures
// Export to JSON
let json = report.to_json?;
Batch Evaluation
Evaluate multiple test cases in parallel:
let results = evaluator
.evaluate_cases_parallel // 4 concurrent
.await;
Evaluate all test files in a directory:
let reports = evaluator
.evaluate_directory
.await?;
Integration with cargo test
async
License
Apache-2.0
Part of ADK-Rust
This crate is part of the ADK-Rust framework for building AI agents in Rust.