agentcarousel 0.5.0

Evaluate agents and skills with YAML fixtures, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation

AgentCarousel

Unit tests for AI agents. Define behavior in YAML, run offline tests, export signed evidence bundles your reviewers will accept.

Crates.io Homebrew License: Apache-2.0 Latest release

Why agentcarousel

  • Deterministic by default - Offline runs with mocks mean same inputs → same outputs, every time.
  • Built for evidence - Every run produces a signed artifact (.tar.gz + minisign attestation) you can hand to an auditor, a reviewer, or your customer's security team.
  • Live evals when you want them - plug in OpenAI, Anthropic, Gemini, or OpenRouter as generator and judge, then diff runs to catch regressions.
  • Compliance-aware fixtures - Risk tier, data handling, and certification track: the metadata your governance program already tracks, baked into the test format.

Install

# Linux install; for Windows, download .zip from Releases
curl -fsSL https://install.agentcarousel.com | sh

# Homebrew (macOS)
brew tap agentcarousel/agentcarousel && brew install agentcarousel

# Cargo (Rust)
cargo install agentcarousel

Quickstart

# Scaffold a skill or agent fixture
agentcarousel init --skill my-skill
agentcarousel init --agent my-agent

# Run it offline (no API keys needed)
agentcarousel test fixtures/skills/my-skill.yaml --offline true

# Validate fixture schema and rules
agentcarousel validate fixtures/skills/customer-support.yaml

# Evaluate (mock by default)
agentcarousel eval fixtures/skills/customer-support.yaml

# Export evidence bundle
agentcarousel export <RUN-ID>

Live Eval with LLM-as-a-judge

export GEMINI_API_KEY=gemini_key
export OPENROUTER_API_KEY=or_key
export ANTHROPIC_API_KEY=claude_api_key
export OPENAI_API_KEY=openai_key

# Run all cases, judge-backed cases use the judge
agentcarousel eval fixtures/ \
  --execution-mode live \
  --evaluator all --judge \
  --model gemini-2.5-flash \
  --judge-model claude-haiku-4-5-20251001 \
  --runs 1

# Narrow to specific cases by id glob or tag
agentcarousel eval fixtures/ \
  --evaluator all --judge \
  --filter "customer-support/judge-*" \
  --filter-tags certification

--evaluator all uses each case's declared evaluator; --evaluator judge forces every case through the judge regardless. Use --filter (glob on skill/case-id) or --filter-tags (comma-separated) to scope runs.

Reports

# List recent runs (newest first)
agentcarousel report list

# Show a run (human-readable, same formatting as eval/test output)
agentcarousel report show <RUN-ID>

# Also accepts a path to run.json or an evidence directory
agentcarousel report show ./evidence/my-export/

# Diff two runs to surface regressions
agentcarousel report diff <RUN-ID-A> <RUN-ID-B>

# JSON output for scripting
agentcarousel report list --json
agentcarousel report show <RUN-ID> --json

Configuration (agentcarousel.toml)

Copy agentcarousel.example.toml to agentcarousel.toml and customize as needed.

Per-case effectiveness thresholds override the global --effectiveness-threshold flag via the evaluator_config.effectiveness_threshold field in YAML.

Bundle workflows

# Create a distributable bundle archive
agentcarousel bundle pack fixtures/bundles/my-bundle --out my-bundle.tar.gz

# Verify bundle integrity and structure
agentcarousel bundle verify fixtures/bundles/customer-support
agentcarousel bundle verify my-bundle.tar.gz

# Pull bundle manifest + artifacts from the registry
agentcarousel bundle pull customer-support-1.0.0 --url "https://api.agentcarousel.com"

Publish to registry

# Publish bundle + evidence in one flow
agentcarousel publish fixtures/bundles/customer-suppport \
  --url "https://api.agentcarousel.com"

# Publish multiple matching local runs (newest first)
agentcarousel publish fixtures/bundles/customer-support \
  --url "https://api.agentcarousel.com" \
  --all-runs --limit 5

Trust checks

# Registry trust-state check
agentcarousel trust-check customer-support@1.0.0 \
  --url "https://api.agentcarousel.com"

# Optional offline attestation verification
agentcarousel trust-check customer-support@1.0.0 \
  --url "https://api.agentcarousel.com" \
  --attestation ./attestation-customer-support-1.0.0.json \
  --minisign-pubkey ./your-minisign.pub

Contributions

For fixture contributions, open an issue before implementation.