Expand description
Benchmark module - evaluates CodeTether across models using Ralph PRDs
Runs standardized PRD-based benchmarks against multiple LLM models, capturing pass rates, timing, token usage, and cost metrics.
Structs§
- Aggregate
Metrics - Aggregate metrics across multiple PRDs
- Benchmark
Config - Configuration for a benchmark run
- Benchmark
Runner - Main benchmark runner
- Benchmark
Submission - Submission payload for the benchmark API
- Benchmark
Suite Result - Complete results from a benchmark suite run
- Benchmark
Summary - Summary across all models
- Model
Benchmark Result - Results for a single model across all PRDs
- Model
Ranking - Ranking for a single model
- PrdBenchmark
Result - Results for a single PRD run
- Quality
Check Result - Quality check result
- Story
Benchmark Result - Result for a single story within a PRD benchmark
Functions§
- detect_
tier - Detect tier from PRD filename (e.g., “t1-rest-api.json” -> 1)