pub struct BenchRunner { /* private fields */ }Expand description
Top-level orchestrator for benchmark execution.
Manages the full lifecycle of a benchmark run:
- Loading workloads (built-in or from file)
- Cost estimation and budget enforcement
- Warm-up phase (iterations discarded)
- Measurement phase (iterations recorded)
- Concurrent execution and sweep modes
- Metric aggregation and CV warnings
- Baseline save and regression detection
Implementations§
Source§impl BenchRunner
impl BenchRunner
Sourcepub fn new(config: BenchConfig) -> Self
pub fn new(config: BenchConfig) -> Self
Creates a new BenchRunner with the given configuration.
Initializes BaselineStore at the configured baseline path
and a default CostTracker with standard model pricing.
Sourcepub async fn run(&self) -> Result<Vec<BenchmarkResult>>
pub async fn run(&self) -> Result<Vec<BenchmarkResult>>
Runs the full benchmark suite and returns results.
§Execution Flow
- Resolves workloads (specific workload via
--workloador all built-in) - Estimates cost and enforces budget (dry-run, max-cost-usd, confirm-cost)
- For each workload:
a. Warm-up phase: run
config.warmupiterations, discard results b. Measurement phase: runconfig.runsiterations, collect metrics c. If sweep mode: iterate through concurrency levels d. If concurrency > 1: spawn concurrent tasks - Aggregates metrics and emits CV warnings
- Returns collected
BenchmarkResultvalues
§Errors
BenchError::WorkloadNotFoundif a specified workload file doesn’t existBenchError::Baselineif cost exceeds--max-cost-usd
Sourcepub fn save_baseline(&self, results: &[BenchmarkResult]) -> Result<()>
pub fn save_baseline(&self, results: &[BenchmarkResult]) -> Result<()>
Saves current results as the regression baseline.
Persists metrics via BaselineStore for later regression detection.
Sourcepub fn check_regression(
&self,
results: &[BenchmarkResult],
) -> Result<Vec<RegressionReport>>
pub fn check_regression( &self, results: &[BenchmarkResult], ) -> Result<Vec<RegressionReport>>
Checks results against saved baseline using configured tolerance.
For benchmark timing metrics, a regression means the current value is higher than the baseline (worse performance). The formula is:
regression detected when (current - baseline) / baseline > toleranceReturns a list of RegressionReport entries for any metrics that
exceed the tolerance threshold. An empty list means no regressions.
§Exit Code Contract
The CLI layer (Task 8.1) should exit with code 2 when this method returns a non-empty list, and exit with code 0 otherwise.