Skip to main content

Module benchmark

Module benchmark 

Source

Structs§

BenchmarkBaseline
Published baseline accuracy for reference comparison.
BenchmarkResult
Result of running an agent on a benchmark task.
BenchmarkRun
Aggregated benchmark run for an agent against one suite.
BenchmarkTask
A single task loaded from a benchmark dataset.

Enums§

BenchmarkSuite
Supported public benchmark suites.

Functions§

published_baselines
Hardcoded published baselines for percentile calculation.