agentcarousel 0.2.3

Evaluate agents and skills with YAML fixtures, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation
# agentcarousel

**agentcarousel** is a Rust CLI (and library) for working with fixture files: schema checks, scenario runs (mock or live), and SQLite history. 

```markdown
Evaluate agent behavior and skills with reproducible fixtures, scored checks, and exportable evidence.
```

## Build

```bash
cargo build
```

## Configuration

The CLI reads `agentcarousel.toml` from the repository root or  
`~/.config/agentcarousel/config.toml`. Use `--config <path>` to override.

Run history defaults to:

- macOS: `~/Library/Application Support/agentcarousel/history.db`
- Linux: `~/.local/share/agentcarousel/history.db`

Set `AGENTCAROUSEL_HISTORY_DB` 

Live generation + judge evaluation are supported with Gemini, OpenAI, Anthropic, and OpenRouter models.

## Quickstart (5 minutes)

```bash
# 1) Build
cargo build

# 2) Export a provider key (examples)
export GEMINI_API_KEY=your_key_here
# or export OPENAI_API_KEY=your_key_here
# or export OPENROUTER_API_KEY=your_key_here

# 3) Run a live evaluation (Gemini example)
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gemini-2.5-flash \
  --judge --judge-model gemini-2.5-flash
```

### Provider recipes

```bash
# Budget: OpenRouter free tier generator + Gemini judge
export OPENROUTER_API_KEY=your_key_here
export GEMINI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model openrouter/free \
  --judge --judge-model gemini-2.5-flash

# Budget: OpenRouter free tier generator + OpenRouter judge
export OPENROUTER_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model openrouter/free \
  --judge --judge-model nvidia/nemotron-3-super-120b-a12b:free

# Balanced: OpenAI generator + OpenAI judge
export OPENAI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gpt-4o-mini \
  --judge --judge-model gpt-4o-mini

# Premium: Gemini generator + Gemini judge
export GEMINI_API_KEY=your_key_here
cargo run -p agentcarousel -- eval --execution-mode live \
  --model gemini-2.5-flash \
  --judge --judge-model gemini-2.5-flash
```

For more examples and troubleshooting, see `[docs/quickstart.md](docs/quickstart.md)`.

## Common commands

```bash
# Validate fixtures against schema (paths required)
cargo run -p agentcarousel -- validate fixtures/examples/example-skill.yaml

# Run tests from fixture paths (defaults to fixtures/)
cargo run -p agentcarousel -- test

# Evaluation pass (see docs/evaluator-contract.md)
cargo run -p agentcarousel -- eval

# Report on stored runs
cargo run -p agentcarousel -- report list

# Scaffold a new fixture YAML
cargo run -p agentcarousel -- init --skill my-skill-name

# Bundle pack/verify (M3)
cargo run -p agentcarousel -- bundle pack my-bundle --out my-bundle.tar.gz
cargo run -p agentcarousel -- bundle verify my-bundle.tar.gz

# Registry workflow (single publish command)
cargo run -p agentcarousel -- publish fixtures/bundles/terraform-sentinel-scaffold --url "https://api.agentcarousel.com"

# Publish all matching local runs for that bundle (newest first)
cargo run -p agentcarousel -- publish fixtures/bundles/terraform-sentinel-scaffold --url "https://api.agentcarousel.com" --all-runs --limit 5

# Export an evidence pack for a run id
cargo run -p agentcarousel -- export <RUN_ID>

# Check registry trust state (online-first)
cargo run -p agentcarousel -- trust-check terraform-sentinel-scaffold@1.0.0 --url "https://api.agentcarousel.com"

# Optional offline minisign verification with local attestation
cargo run -p agentcarousel -- trust-check terraform-sentinel-scaffold@1.0.0 \
  --url "https://api.agentcarousel.com" \
  --attestation ./attestation-terraform-sentinel-scaffold-1.0.0.json \
  --minisign-pubkey ./agentcarousel-minisign.pub
```

## Internal modules

CLI, core, fixtures, runner, evaluators, and reporters live as submodules under `crates/agentcarousel/src/` in one Cargo package.

## Documentation

See [Github Repo](https://github.com/agentcarousel/agentcarousel)

**ATF / trust:** AgentCarousel maps to the [Agentic Trust Framework](https://github.com/massivescale-ai/agentic-trust-framework) as an **evidence + CI gates** implementation.