agentcarousel 0.3.0

Evaluate agents and skills with YAML fixtures, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation

agentcarousel

Evaluate agent and skill behavior with reproducible fixtures, scored checks, and exportable evidence.

agentcarousel is a cli for people who want confidence in AI agent behavior before shipping. It starts simple (validate + test locally) and scales to live evaluation, evidence export, and trust workflows.

Why people use this

  • Verify agent quality quickly with schema + rule checks.
  • Run deterministic offline tests in CI.
  • Store run history and compare results over time.
  • Export evidence bundles for audits and share benchmarks.
  • Publish bundles/runs to a registry and verify trust state.

Start Here

If you just want value fast, do these three commands.

1) Install

curl -fsSL http://install.agentcarousel.com | sh

Notes:

  • Installer supports Linux and macOS.
  • On Windows, download the .zip release asset from GitHub.
  • The installer offers an agc alias for convenience.

2) Validate fixtures

A fixture is a YAML file that defines test cases (and optional mocks) for an agent or skill so the CLI can validate, run offline tests, or evaluate behavior.

agentcarousel validate

With no paths, validate scans the current directory for fixture files.

3) Run offline tests

agentcarousel test --offline true

This is the safest default for CI and public fixture repos.

Common Commands

# Help
agentcarousel --help
agentcarousel <command> --help

# Validate fixture files (schema + rules)
agentcarousel validate fixtures/skills/my-skill.yaml

# Run fixtures with mock generation
agentcarousel test

# Evaluate (mock or live)
agentcarousel eval

# List and inspect stored runs
agentcarousel report list
agentcarousel report show <RUN_ID>
agentcarousel report diff <RUN_ID_A> <RUN_ID_B>

# Scaffold a new fixture
agentcarousel init --skill my-new-skill

# Export evidence for one run (or newest N runs)
agentcarousel export <RUN_ID>
agentcarousel export --last 5 --out-dir ./evidence

Configuration

Config file lookup order:

  1. --config <path> (explicit)
  2. ./agentcarousel.toml (project)
  3. ~/.config/agentcarousel/config.toml (user)

History database defaults:

  • macOS: ~/Library/Application Support/agentcarousel/history.db
  • Linux: ~/.local/share/agentcarousel/history.db

Override history path with:

export AGENTCAROUSEL_HISTORY_DB=/path/to/history.db

Development and Power Users

Everything below is for running live model evaluations, publishing evidence, or integrating trust workflows.

Live evaluation with judge models

Supported providers for live generation and judging include Gemini, OpenAI, Anthropic, and OpenRouter.

export GEMINI_API_KEY=your_key_here
agentcarousel eval --execution-mode live \
  --model gemini-2.5-flash \
  --judge --judge-model gemini-2.5-flash

More recipes and troubleshooting: docs/quickstart.md

Bundle workflows

A versioned package of fixtures, described by bundle.manifest.json with content hashes, that can be packed, verified, and published.

# Create a distributable bundle archive
agentcarousel bundle pack fixtures/bundles/my-bundle --out my-bundle.tar.gz

# Verify bundle integrity and structure
agentcarousel bundle verify my-bundle.tar.gz

# Pull bundle manifest + artifacts from a registry (see docs/registry-api-contract.md)
agentcarousel bundle pull cmmc-assessor-1.0.0 --url "$REGISTRY_API_BASE_URL" -o ./pulled/cmmc-assessor

Publish to a registry

# Publish bundle + evidence in one flow
agentcarousel publish fixtures/bundles/terraform-sentinel-scaffold \
  --url "https://api.agentcarousel.com"

# Publish multiple matching local runs (newest first)
agentcarousel publish fixtures/bundles/terraform-sentinel-scaffold \
  --url "https://api.agentcarousel.com" \
  --all-runs --limit 5

Trust checks

The registry has the published assurance state for a bundle (queryable via trust-check), optionally backed by a signed attestation (e.g. minisign) that you verify.

# Registry trust-state check
agentcarousel trust-check terraform-sentinel-scaffold@1.0.0 \
  --url "https://api.agentcarousel.com"

# Optional offline attestation verification
agentcarousel trust-check terraform-sentinel-scaffold@1.0.0 \
  --url "https://api.agentcarousel.com" \
  --attestation ./attestation-terraform-sentinel-scaffold-1.0.0.json \
  --minisign-pubkey ./agentcarousel-minisign.pub

External evaluator contract

If you want custom evaluators (Python/JS/etc.), use:

It defines the stdin/stdout JSON contract and exit-code semantics.

Build From Source

Prerequisites:

  • Rust 1.95+
git clone https://github.com/agentcarousel/agentcarousel.git
cd agentcarousel

# Build package and binaries
cargo build -p agentcarousel

# Run from source (explicit binary)
cargo run -p agentcarousel --bin agentcarousel -- --help

Binaries provided by this package:

  • agentcarousel
  • agc

OSS and Contributions

We welcome public fixture and documentation improvements.

For fixture contributions, issue-first intake is required before implementation.

Releases and crates.io

ATF Alignment

AgentCarousel maps to the Agentic Trust Framework as an evidence and CI-gates implementation layer.