Skip to main content

Crate agentcarousel

Crate agentcarousel 

Source
Expand description

Evaluate agents and skills from YAML/TOML fixtures, run cases with mocks or live backends, persist runs to SQLite, and export evidence for reporting or registry upload.

§Audience

§Crate layout

ModuleRole
cliClap-based CLI; cli::run is the process entrypoint.
coreSerializable models, errors, and judge provider helpers.
runnerAsync execution: runner::run_fixtures, runner::run_eval.
evaluatorsRules, golden, process, and LLM judge evaluators.
fixturesLoad and validate fixtures; fixtures::MockEngine for stubbed tool/LLM responses.
reportersTerminal, JSON, JUnit, history persistence, and run diffs.

§Quick start (CLI)

agentcarousel validate path/to/fixture.yaml
agentcarousel test path/to/fixture.yaml --offline true

Install and full options are described in the repository README and on docs.rs for this crate version.

§Library quick start

Typical flow: load fixtures with fixtures::load_fixture, build runner::RunnerConfig or runner::EvalConfig, then call runner::run_fixtures or runner::run_eval inside a tokio runtime. See runner for configuration fields.

Re-exports§

pub use cli::*;
pub use core::*;
pub use evaluators::*;
pub use fixtures::*;
pub use reporters::*;
pub use runner::*;

Modules§

cli
Command-line interface built with clap: parse flags, load merged configuration from TOML and environment, and dispatch to validate / test / eval / report / init / bundle / publish / export / trust-check.
core
Shared domain types for fixtures, runs, traces, and metrics, plus CoreError and judge_provider helpers for LLM-backed judges.
evaluators
Pluggable evaluators that score a finished crate::CaseResult against fixture rubrics or external references: RulesEvaluator, GoldenEvaluator, ProcessEvaluator, JudgeEvaluator, and the Evaluator trait.
fixtures
Fixture I/O and validation: load YAML/TOML into crate::FixtureFile, validate against the bundled JSON Schema, and resolve tool/LLM responses via MockEngine.
reporters
Human-readable and machine-readable output: terminal tables, JSON, JUnit, persisted history (SQLite), and diff_runs / print_diff for comparing two runs.
runner
Async test and eval execution: expand fixtures into Case rows, apply mocks or live generation, optionally run evaluators (rules / golden / process / judge), and produce a Run.