pondus
Opinionated AI model benchmark aggregator.
What it does
Aggregates AI model benchmark data from 8 trusted sources into a unified JSON schema. Designed for AI agents (Claude Code, etc.) to consume programmatically. Caches results for 24h to avoid rate limiting.
Sources
| Source | Type | Data |
|---|---|---|
| Artificial Analysis | REST API | Speed, quality, pricing metrics |
| LM Arena (LMSYS) | Community JSON | ELO ratings from human preferences |
| SWE-bench | GitHub JSON | Code generation resolve rates |
| SWE-rebench | agent-browser scrape | Code generation resolve rates (rebench variant) |
| Aider | GitHub YAML | Polyglot coding benchmark pass rates |
| LiveBench | HuggingFace | Multi-domain benchmark scores |
| Terminal-Bench | HuggingFace | Terminal/CLI task completion |
| SEAL | agent-browser scrape | Scale AI evaluation scores |
Note: Sources marked "agent-browser scrape" require the
agent-browserCLI installed separately. All other sources work out of the box.
Installation
cargo install pondus
Usage
Global Flags
| Flag | Description |
|---|---|
| `--format json | table |
--refresh |
Bypass cache for this run |
Configuration
Config location: ~/.config/pondus/config.toml
[]
= 24
[]
= "models.toml" # relative to config dir, or absolute path
[]
= "your-key" # optional, for AA source
[]
= "agent-browser" # path to agent-browser CLI
Model Aliases
Different benchmarks use different naming conventions. models.toml maps canonical model names to source-specific variants:
[]
= "claude-opus-4.6"
= [
"Claude Opus 4.6",
"claude-opus-4-6",
"anthropic/claude-opus-4.6",
"Opus 4.6",
]
When you run pondus check opus-4.6, pondus resolves the alias to the canonical name and matches it across all sources regardless of how each source names the model. PRs welcome to add new models.
Output Format
Default JSON output:
Contributing
- Add a model: Add an entry to
models.tomlwith canonical name and known aliases - Add a source: Implement the
Sourcetrait insrc/sources/
PRs welcome.
License
MIT
Sister Tools
Part of a family of AI-augmented CLI tools: