Expand description
Evaluate agents and skills from YAML/TOML fixtures, run cases with mocks or live backends, persist runs to SQLite, and export evidence for reporting or registry upload.
§Audience
- CLI users — run the
agentcarouseloragcbinary forvalidate,test,eval,report,bundle,publish,export, andtrust-check. - Library embedders — use
runnerto execute fixtures programmatically,fixturesto load them,evaluatorsfor scoring,reportersfor output, andcorefor shared types (Run,Case,FixtureFile, …).
§Crate layout
| Module | Role |
|---|---|
cli | Clap-based CLI; cli::run is the process entrypoint. |
core | Serializable models, errors, and judge provider helpers. |
runner | Async execution: runner::run_fixtures, runner::run_eval. |
evaluators | Rules, golden, process, and LLM judge evaluators. |
fixtures | Load and validate fixtures; fixtures::MockEngine for stubbed tool/LLM responses. |
reporters | Terminal, JSON, JUnit, history persistence, and run diffs. |
§Quick start (CLI)
agentcarousel validate path/to/fixture.yaml
agentcarousel test path/to/fixture.yaml --offline trueInstall and full options are described in the repository README and on docs.rs for this crate version.
§Library quick start
Typical flow: load fixtures with fixtures::load_fixture, build runner::RunnerConfig
or runner::EvalConfig, then call runner::run_fixtures or runner::run_eval
inside a tokio runtime. See runner for configuration fields.
Re-exports§
pub use cli::*;pub use core::*;pub use evaluators::*;pub use fixtures::*;pub use reporters::*;pub use runner::*;
Modules§
- cli
- Command-line interface built with
clap: parse flags, load merged configuration from TOML and environment, and dispatch to validate / test / eval / report / init / bundle / publish / export / trust-check. - core
- Shared domain types for fixtures, runs, traces, and metrics, plus
CoreErrorandjudge_providerhelpers for LLM-backed judges. - evaluators
- Pluggable evaluators that score a finished
crate::CaseResultagainst fixture rubrics or external references:RulesEvaluator,GoldenEvaluator,ProcessEvaluator,JudgeEvaluator, and theEvaluatortrait. - fixtures
- Fixture I/O and validation: load YAML/TOML into
crate::FixtureFile, validate against the bundled JSON Schema, and resolve tool/LLM responses viaMockEngine. - reporters
- Human-readable and machine-readable output: terminal tables, JSON, JUnit, persisted
history (SQLite), and
diff_runs/print_difffor comparing two runs. - runner
- Async test and eval execution: expand fixtures into
Caserows, apply mocks or live generation, optionally run evaluators (rules / golden / process / judge), and produce aRun.