Structs§
- Benchmark
Baseline - Published baseline accuracy for reference comparison.
- Benchmark
Result - Result of running an agent on a benchmark task.
- Benchmark
Run - Aggregated benchmark run for an agent against one suite.
- Benchmark
Task - A single task loaded from a benchmark dataset.
Enums§
- Benchmark
Suite - Supported public benchmark suites.
Functions§
- published_
baselines - Hardcoded published baselines for percentile calculation.