Skip to main content

Module benchmark

codetether_agent

Module benchmark

Expand description

Benchmark module - evaluates CodeTether across models using Ralph PRDs

Runs standardized PRD-based benchmarks against multiple LLM models, capturing pass rates, timing, token usage, and cost metrics.

Structs§

AggregateMetrics: Aggregate metrics across multiple PRDs
BenchmarkConfig: Configuration for a benchmark run
BenchmarkRunner: Main benchmark runner
BenchmarkSubmission: Submission payload for the benchmark API
BenchmarkSuiteResult: Complete results from a benchmark suite run
BenchmarkSummary: Summary across all models
ModelBenchmarkResult: Results for a single model across all PRDs
ModelRanking: Ranking for a single model
PrdBenchmarkResult: Results for a single PRD run
QualityCheckResult: Quality check result
StoryBenchmarkResult: Result for a single story within a PRD benchmark

Functions§

detect_tier: Detect tier from PRD filename (e.g., “t1-rest-api.json” -> 1)