agentcarousel 0.2.1

Check YAML/TOML fixtures for agents and skills, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation

agentcarousel

agentcarousel is a Rust CLI (and library) for working with fixture files: schema checks, scenario runs (mock or live), and digging through what you already ran in the local SQLite history. design_spec_01.md is the product sketch; docs/ has the command-by-command detail.

MSRV: Rust 1.95 (rust-version in crates/agentcarousel/Cargo.toml).

Public binary distribution

Templates for the public distribution repository (install script, user-facing README, checksums policy) live under distribution/. Curated default-branch hub (fixtures, mocks, public docs, smoke workflow) is staged under distribution/public-hub/ for manual mirroring into the public repo. CI builds multi-platform release artifacts via .github/workflows/releasing.yml. Operational checklist: docs/public-distribution-setup.md. Maintainers: release gate and publish (CI builds Linux/macOS/Windows after all tests pass). Private Git dependencies in Cargo: docs/ci-private-dependencies.md.

Build

cargo build

The agentcarousel binary is the agentcarousel package (thin main that calls agentcarousel::run()).

Configuration

The CLI reads agentcarousel.toml from the repository root or ~/.config/agentcarousel/config.toml (XDG). Use --config <path> to override.

Run history defaults to:

  • macOS: ~/Library/Application Support/agentcarousel/history.db
  • Linux: ~/.local/share/agentcarousel/history.db

Set AGENTCAROUSEL_HISTORY_DB (or [report].history_db in the config) to override.

Live generation + judge evaluation are supported with Gemini, OpenAI, Anthropic, and OpenRouter models.

Quickstart (5 minutes)

# 1) Build
cargo build

# 2) Export a provider key (examples)
export GEMINI_API_KEY=your_key_here
# or export OPENAI_API_KEY=your_key_here
# or export OPENROUTER_API_KEY=your_key_here

# 3) Run a live evaluation (Gemini example)
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gemini-1.5-pro \
  --judge --judge-model gemini-1.5-pro

Provider recipes

# Budget: OpenRouter free tier generator + Gemini judge
export OPENROUTER_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model openrouter/free \
  --judge --judge-model gemini-1.5-pro

# Budget: OpenRouter free tier generator + OpenRouter judge
export OPENROUTER_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model openrouter/free \
  --judge --judge-model nvidia/nemotron-3-super-120b-a12b:free

# Balanced: OpenAI generator + OpenAI judge
export OPENAI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gpt-4o-mini \
  --judge --judge-model gpt-4o-mini

# Premium: Gemini generator + Gemini judge
export GEMINI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gemini-1.5-pro \
  --judge --judge-model gemini-1.5-pro

For more examples and troubleshooting, see docs/quickstart.md.

Common commands

# Validate fixtures against schema (paths required)
cargo run -p agentcarousel -- validate fixtures/examples/example-skill.yaml

# Run tests from fixture paths (defaults to fixtures/)
cargo run -p agentcarousel -- test

# Evaluation pass (see docs/evaluator-contract.md)
cargo run -p agentcarousel -- eval

# Report on stored runs
cargo run -p agentcarousel -- report list

# Scaffold a new fixture YAML
cargo run -p agentcarousel -- init --skill my-skill-name

# Bundle pack/verify (M3)
cargo run -p agentcarousel -- bundle pack my-bundle --out my-bundle.tar.gz
cargo run -p agentcarousel -- bundle verify my-bundle.tar.gz

# Registry workflow (single publish command)
cargo run -p agentcarousel -- publish fixtures/bundles/cmmc-assessor --url "$REGISTRY_API_BASE_URL"

# Publish all matching local runs for that bundle (newest first)
cargo run -p agentcarousel -- publish fixtures/bundles/cmmc-assessor --url "$REGISTRY_API_BASE_URL" --all-runs --limit 5

# Export an evidence pack for a run id
cargo run -p agentcarousel -- export <RUN_ID>

# Check registry trust state (online-first)
cargo run -p agentcarousel -- trust-check cmmc-assessor@1.0.0 --url "$REGISTRY_API_BASE_URL"

# Optional offline minisign verification with local attestation
cargo run -p agentcarousel -- trust-check cmmc-assessor@1.0.0 \
  --url "$REGISTRY_API_BASE_URL" \
  --attestation ./attestation-cmmc-assessor-1.0.0.json \
  --minisign-pubkey ./agentcarousel-minisign.pub

Internal modules

CLI, core, fixtures, runner, evaluators, and reporters live as submodules under crates/agentcarousel/src/ in one Cargo package (suitable for docs.rs and cargo install).

Documentation

ATF / trust: AgentCarousel maps to the Agentic Trust Framework as an evidence + CI gates implementation (not a certification claim). See docs/atf-ecosystem.md for the five-element crosswalk and docs/trust-posture.md for non-goals, signing stance, and redaction limits.

Document Description
docs/atf-ecosystem.md ATF five elements → CLI commands and artifacts
docs/trust-posture.md Bounded trust, signing deferral, redaction stub
docs/ci-gates.md CI-blocking commands and failure semantics
docs/fixture-versioning.md Bundle semver, changelogs, deprecation
docs/exit-codes.md Process exit codes for CI
docs/fixture-format.md Fixture schema and field reference
docs/fixture-development-process.md Authoring fixtures
docs/evaluator-contract.md External evaluator contract
docs/quickstart.md Live evaluation quickstart recipes
docs/oss-launch-checklist.md OSS release checklist and notes
docs/crates-io-publish.md Publishing the Rust crate to crates.io
distribution/public-hub/MIRROR.md What to copy into the public distribution default branch
docs/plans/agentcarousel-gtm/ GTM task plan, findings, progress (planning-with-files)
docs/evidence-loop-runbook.md M1 evidence: validate → test → eval → judge → export
docs/judge-prompts/README.md Versioned judge prompts for certification pilots
docs/registry-api-contract.md MSP registry POST/GET contract (v0)
docs/monetization-pilot.md Priced offers + paid pilot SOW checklist
docs/atf-upstream-ecosystem-pr-draft.md Copy-paste draft for ATF repo ecosystem PR
docs/CROSSWALKS.md ATF ↔ fixtures short crosswalk (mirrored to public docs/)