Skip to main content

Module benchmark

Module benchmark 

Source
Expand description

Benchmark module - evaluates CodeTether across models using Ralph PRDs

Runs standardized PRD-based benchmarks against multiple LLM models, capturing pass rates, timing, token usage, and cost metrics.

Structs§

AggregateMetrics
Aggregate metrics across multiple PRDs
BenchmarkConfig
Configuration for a benchmark run
BenchmarkRunner
Main benchmark runner
BenchmarkSubmission
Submission payload for the benchmark API
BenchmarkSuiteResult
Complete results from a benchmark suite run
BenchmarkSummary
Summary across all models
ModelBenchmarkResult
Results for a single model across all PRDs
ModelRanking
Ranking for a single model
PrdBenchmarkResult
Results for a single PRD run
QualityCheckResult
Quality check result
StoryBenchmarkResult
Result for a single story within a PRD benchmark

Functions§

detect_tier
Detect tier from PRD filename (e.g., “t1-rest-api.json” -> 1)