agentcarousel
Evaluate agent and skill behavior with reproducible fixtures, scored checks, and exportable evidence.
agentcarousel is a cli for people who want confidence in AI agent behavior before shipping. It starts simple (validate + test locally) and scales to live evaluation, evidence export, and trust workflows.
Why people use this
- Verify agent quality quickly with schema + rule checks.
- Run deterministic offline tests in CI.
- Store run history and compare results over time.
- Export evidence bundles for audits and share benchmarks.
- Publish bundles/runs to a registry and verify trust state.
Start Here
If you just want value fast, do these three commands.
1) Install
|
Notes:
- Installer supports Linux and macOS.
- On Windows, download the
.ziprelease asset from GitHub. - The installer offers an
agcalias for convenience.
2) Validate fixtures
A fixture is a YAML file that defines test cases (and optional mocks) for an agent or skill so the CLI can validate, run offline tests, or evaluate behavior.
With no paths, validate scans the current directory for fixture files.
3) Run offline tests
This is the safest default for CI and public fixture repos.
Common Commands
# Help
# Validate fixture files (schema + rules)
# Run fixtures with mock generation
# Evaluate (mock or live)
# List and inspect stored runs
# Scaffold a new fixture
# Export evidence for one run (or newest N runs)
Configuration
Config file lookup order:
--config <path>(explicit)./agentcarousel.toml(project)~/.config/agentcarousel/config.toml(user)
History database defaults:
- macOS:
~/Library/Application Support/agentcarousel/history.db - Linux:
~/.local/share/agentcarousel/history.db
Override history path with:
Development and Power Users
Everything below is for running live model evaluations, publishing evidence, or integrating trust workflows.
Live evaluation with judge models
Supported providers for live generation and judging include Gemini, OpenAI, Anthropic, and OpenRouter.
More recipes and troubleshooting: docs/quickstart.md
Bundle workflows
A versioned package of fixtures, described by bundle.manifest.json with content hashes, that can be packed, verified, and published.
# Create a distributable bundle archive
# Verify bundle integrity and structure
# Pull bundle manifest + artifacts from a registry (see docs/registry-api-contract.md)
Publish to a registry
# Publish bundle + evidence in one flow
# Publish multiple matching local runs (newest first)
Trust checks
The registry has the published assurance state for a bundle (queryable via trust-check), optionally backed by a signed attestation (e.g. minisign) that you verify.
# Registry trust-state check
# Optional offline attestation verification
External evaluator contract
If you want custom evaluators (Python/JS/etc.), use:
It defines the stdin/stdout JSON contract and exit-code semantics.
Build From Source
Prerequisites:
- Rust 1.95+
# Build package and binaries
# Run from source (explicit binary)
Binaries provided by this package:
agentcarouselagc
OSS and Contributions
We welcome public fixture and documentation improvements.
- Start here:
distribution/CONTRIBUTING.md - Security policy:
distribution/SECURITY.md - Changelog:
distribution/CHANGELOG.md
For fixture contributions, issue-first intake is required before implementation.
Releases and crates.io
- GitHub repository: github.com/agentcarousel/agentcarousel
- crates.io package: crates.io/crates/agentcarousel
- API docs: docs.rs/agentcarousel
- Publish checklist:
docs/crates-io-publish.md
ATF Alignment
AgentCarousel maps to the Agentic Trust Framework as an evidence and CI-gates implementation layer.