Skip to main content

Crate adk_bench

Crate adk_bench 

Source
Expand description

§adk-bench

A comprehensive benchmarking framework for ADK-Rust that measures framework-level runtime performance using real LLM APIs.

adk-bench isolates framework overhead from LLM latency through precise per-call instrumentation, supports concurrent agent throughput testing, memory profiling, and external framework comparison via subprocess execution with a standardized JSON protocol (External Benchmark Protocol).

§Features

  • Cold start measurement: Binary launch to first LLM call timing
  • Agent loop overhead: Per-turn framework processing latency (excluding LLM time)
  • Concurrent throughput: Agents/second under Tokio async load
  • Memory footprint: Platform-specific RSS sampling (Linux/macOS)
  • Tool invocation latency: Deserialization, validation, and dispatch timing
  • Token overhead: Framework-injected token cost analysis
  • External comparison: Subprocess-based competitor framework benchmarking
  • Regression detection: Baseline save/compare with configurable tolerance

§Feature Flags

  • tau2 — Enables the τ²-bench task quality adapter
  • bfcl — Enables the BFCL (Berkeley Function Calling Leaderboard) adapter

§Quick Start

use adk_bench::{BenchConfig, BenchRunner};

let config = BenchConfig::default();
let runner = BenchRunner::new(config);
let results = runner.run().await?;

Re-exports§

pub use adapters::CaseResult;
pub use adapters::TaskQualityAdapter;
pub use adapters::TaskQualityResult;
pub use config::BenchConfig;
pub use config::ExternalFrameworkConfig;
pub use config::OutputFormat;
pub use config::TaskSuite;
pub use error::BenchError;
pub use error::Result;
pub use external::ExternalConfigFile;
pub use external::ExternalDurationStats;
pub use external::ExternalMetricsOutput;
pub use external::ExternalRunner;
pub use external::ExternalTokenOverhead;
pub use external::load_external_configs;
pub use formatter::ComparisonResult;
pub use formatter::format_comparison;
pub use formatter::format_result;
pub use instrumented_llm::DeterministicConfig;
pub use instrumented_llm::InstrumentedLlm;
pub use instrumented_llm::LlmCallRecord;
pub use metrics::BenchmarkResult;
pub use metrics::ConcurrencyLevel;
pub use metrics::DurationStats;
pub use metrics::MemoryMetrics;
pub use metrics::MetricCollector;
pub use metrics::RunMetadata;
pub use metrics::ThroughputMetrics;
pub use metrics::TokenBreakdown;
pub use metrics::TokenOverheadMetrics;
pub use metrics::ToolInvocationMetrics;
pub use metrics::compute_stats;
pub use runner::BenchRunner;
pub use runner::RegressionReport;
pub use workload::AgentConfig;
pub use workload::ToolDefinition;
pub use workload::Workload;
pub use workload::builtin_workloads;
pub use workload::load_workload;
pub use workload::multi_agent_delegation_workload;

Modules§

adapters
Task quality adapters for established benchmark suites. Task quality adapters for established benchmark suites.
config
Benchmark configuration types mapped from CLI flags.
error
Error types for the adk-bench benchmarking framework.
external
External framework comparison via subprocess execution.
formatter
Result output formatting (JSON, table, markdown).
instrumented_llm
Instrumented LLM wrapper for capturing per-call timing metrics.
memory
Platform-specific memory (RSS) sampling.
metrics
Metric collection and statistical computation.
runner
Benchmark runner orchestrator.
workload
Workload schema, loading, and validation.