agentcarousel 0.3.0

Evaluate agents and skills with YAML fixtures, run cases (mock or live), and keep run rows in SQLite for reports and evidence export.
Documentation
# agentcarousel

Evaluate agent and skill behavior with reproducible fixtures, scored checks, and exportable evidence.

`agentcarousel` is a cli for people who want confidence in AI agent behavior before shipping. It starts simple (validate + test locally) and scales to live evaluation, evidence export, and trust workflows.

## Why people use this

- Verify agent quality quickly with schema + rule checks.
- Run deterministic offline tests in CI.
- Store run history and compare results over time.
- Export evidence bundles for audits and share benchmarks.
- Publish bundles/runs to a registry and verify trust state.

## Start Here

If you just want value fast, do these three commands.

### 1) Install

```bash
curl -fsSL http://install.agentcarousel.com | sh
```

Notes:

- Installer supports Linux and macOS.
- On Windows, download the `.zip` release asset from GitHub.
- The installer offers an `agc` alias for convenience.

### 2) Validate fixtures

A fixture is a YAML file that defines test cases (and optional mocks) for an agent or skill so the CLI can validate, run offline tests, or evaluate behavior.

```bash
agentcarousel validate
```

With no paths, `validate` scans the current directory for fixture files.

### 3) Run offline tests

```bash
agentcarousel test --offline true
```

This is the safest default for CI and public fixture repos.

## Common Commands

```bash
# Help
agentcarousel --help
agentcarousel <command> --help

# Validate fixture files (schema + rules)
agentcarousel validate fixtures/skills/my-skill.yaml

# Run fixtures with mock generation
agentcarousel test

# Evaluate (mock or live)
agentcarousel eval

# List and inspect stored runs
agentcarousel report list
agentcarousel report show <RUN_ID>
agentcarousel report diff <RUN_ID_A> <RUN_ID_B>

# Scaffold a new fixture
agentcarousel init --skill my-new-skill

# Export evidence for one run (or newest N runs)
agentcarousel export <RUN_ID>
agentcarousel export --last 5 --out-dir ./evidence
```

## Configuration

Config file lookup order:

1. `--config <path>` (explicit)
2. `./agentcarousel.toml` (project)
3. `~/.config/agentcarousel/config.toml` (user)

History database defaults:

- macOS: `~/Library/Application Support/agentcarousel/history.db`
- Linux: `~/.local/share/agentcarousel/history.db`

Override history path with:

```bash
export AGENTCAROUSEL_HISTORY_DB=/path/to/history.db
```

## Development and Power Users

Everything below is for running live model evaluations, publishing evidence, or integrating trust workflows.

### Live evaluation with judge models

Supported providers for live generation and judging include Gemini, OpenAI, Anthropic, and OpenRouter.

```bash
export GEMINI_API_KEY=your_key_here
agentcarousel eval --execution-mode live \
  --model gemini-2.5-flash \
  --judge --judge-model gemini-2.5-flash
```

More recipes and troubleshooting: [`docs/quickstart.md`](docs/quickstart.md)

### Bundle workflows

A versioned package of fixtures, described by bundle.manifest.json with content hashes, that can be packed, verified, and published.

```bash
# Create a distributable bundle archive
agentcarousel bundle pack fixtures/bundles/my-bundle --out my-bundle.tar.gz

# Verify bundle integrity and structure
agentcarousel bundle verify my-bundle.tar.gz

# Pull bundle manifest + artifacts from a registry (see docs/registry-api-contract.md)
agentcarousel bundle pull cmmc-assessor-1.0.0 --url "$REGISTRY_API_BASE_URL" -o ./pulled/cmmc-assessor
```

### Publish to a registry

```bash
# Publish bundle + evidence in one flow
agentcarousel publish fixtures/bundles/terraform-sentinel-scaffold \
  --url "https://api.agentcarousel.com"

# Publish multiple matching local runs (newest first)
agentcarousel publish fixtures/bundles/terraform-sentinel-scaffold \
  --url "https://api.agentcarousel.com" \
  --all-runs --limit 5
```

### Trust checks

The [registry](https://agentcarousel.com/agents) has the published assurance state for a bundle (queryable via `trust-check`), optionally backed by a signed attestation (e.g. `minisign`) that you verify.

```bash
# Registry trust-state check
agentcarousel trust-check terraform-sentinel-scaffold@1.0.0 \
  --url "https://api.agentcarousel.com"

# Optional offline attestation verification
agentcarousel trust-check terraform-sentinel-scaffold@1.0.0 \
  --url "https://api.agentcarousel.com" \
  --attestation ./attestation-terraform-sentinel-scaffold-1.0.0.json \
  --minisign-pubkey ./agentcarousel-minisign.pub
```

### External evaluator contract

If you want custom evaluators (Python/JS/etc.), use:

- [`docs/evaluator-contract.md`]docs/evaluator-contract.md

It defines the stdin/stdout JSON contract and exit-code semantics.

## Build From Source

Prerequisites:

- Rust 1.95+

```bash
git clone https://github.com/agentcarousel/agentcarousel.git
cd agentcarousel

# Build package and binaries
cargo build -p agentcarousel

# Run from source (explicit binary)
cargo run -p agentcarousel --bin agentcarousel -- --help
```

Binaries provided by this package:

- `agentcarousel`
- `agc`

## OSS and Contributions

We welcome public fixture and documentation improvements.

- Start here: [`distribution/CONTRIBUTING.md`]distribution/CONTRIBUTING.md
- Security policy: `distribution/SECURITY.md`
- Changelog: [`distribution/CHANGELOG.md`]distribution/CHANGELOG.md

For fixture contributions, issue-first intake is required before implementation.

## Releases and crates.io

- GitHub repository: [github.com/agentcarousel/agentcarousel]https://github.com/agentcarousel/agentcarousel
- crates.io package: [crates.io/crates/agentcarousel]https://crates.io/crates/agentcarousel
- API docs: [docs.rs/agentcarousel]https://docs.rs/agentcarousel
- Publish checklist: [`docs/crates-io-publish.md`]docs/crates-io-publish.md

## ATF Alignment

AgentCarousel maps to the [Agentic Trust Framework](https://github.com/massivescale-ai/agentic-trust-framework) as an evidence and CI-gates implementation layer.