agentcarousel 0.4.5

Evaluate agents and skills with YAML fixtures, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation

AgentCarousel

Unit tests for AI agents. Define behavior in YAML, run offline tests, export signed evidence bundles your reviewers will accept.

Crates.io Homebrew License: Apache-2.0 Latest release

Why agentcarousel

  • Deterministic by default - Offline runs with mocks mean same inputs → same outputs, every time.
  • Built for evidence - Every run produces a signed artifact (.tar.gz + minisign attestation) you can hand to an auditor, a reviewer, or your customer's security team.
  • Live evals when you want them - plug in OpenAI, Anthropic, Gemini, or OpenRouter as generator and judge. Diff runs. Catch regressions.
  • Compliance-aware fixtures - Risk tier, data handling, certification track — the metadata your governance program already tracks, baked into the test format.

Install

# Install (Linux — Windows: download .zip from Releases)
curl -fsSL https://install.agentcarousel.com | sh

# Homebrew (macOS)
brew tap agentcarousel/agentcarousel && brew install agentcarousel

# Cargo (Rust)
cargo install agentcarousel

Quickstart

# Scaffold a fixture
agentcarousel init --skill my-agent

# Run it offline — no API keys needed
agentcarousel test --offline true

# Validate
agentcarousel validate fixtures/skills/cmmc-assessor.yaml

# Eval
agentcarousel eval fixtures/skills/cmmc-assessor.yaml

# Export evidence bundle
agentcarousel export <RUN-ID>

Live Eval with LLM-as-a-judge

export GEMINI_API_KEY=gemini_key
export OPENROUTER_API_KEY=or_key
export ANTHROPIC_API_KEY=claude_api_key
export OPENAI_API_KEY=openai_key
agentcarousel eval --execution-mode live --judge \
  --model gemini-2.5-flash \
  --judge --judge-model claude-haiku-4-5-20251001 \
  --evaluator all \
  --runs 1 \

Bundle workflows

# Create a distributable bundle archive
agentcarousel bundle pack fixtures/bundles/my-bundle --out my-bundle.tar.gz

# Verify bundle integrity and structure
agentcarousel bundle verify my-bundle.tar.gz

# Pull bundle manifest + artifacts from the registry
agentcarousel bundle pull cmmc-assessor-1.0.0 --url "https://api.agentcarousel.com"

Publish to registry

# Publish bundle + evidence in one flow
agentcarousel publish fixtures/bundles/cmmc-assessor \
  --url "https://api.agentcarousel.com"

# Publish multiple matching local runs (newest first)
agentcarousel publish fixtures/bundles/cmmc-assessor \
  --url "https://api.agentcarousel.com" \
  --all-runs --limit 5

Trust checks

# Registry trust-state check
agentcarousel trust-check cmmc-assessor@1.0.0 \
  --url "https://api.agentcarousel.com"

# Optional offline attestation verification
agentcarousel trust-check cmmc-assessor@1.0.0 \
  --url "https://api.agentcarousel.com" \
  --attestation ./attestation-cmmc-assessor-1.0.0.json \
  --minisign-pubkey ./your-minisign.pub

Configuration

Config file lookup order:

  1. --config <path> (explicit)
  2. ./agentcarousel.toml (project)
  3. ~/.config/agentcarousel/config.toml (user)

Database defaults:

  • macOS: ~/Library/Application Support/agentcarousel/history.db
  • Linux: ~/.local/share/agentcarousel/history.db

Override history path with:

export AGENTCAROUSEL_HISTORY_DB=/path/to/history.db

Contributions

For fixture contributions, open an issue before implementation.