Skip to main content

Crate skilltest_core

Crate skilltest_core 

Source
Expand description

skilltest-core — the library that powers the skilltest CLI and, through it, the language SDKs and test-framework packages.

The flow is: load a Config and one or more TestCases, build a Provider (the boundary to oneharness / a model), and hand both to a Runner, which drives each case into a conversation, scores the transcript with natural-language Evals, and returns a Report. The report’s JSON form is the stable contract the language SDKs consume.

Everything that crosses a trust boundary — config files, test-case YAML, skill frontmatter, and every provider response — is parsed into a typed model before use.

Re-exports§

pub use config::CommandConfig;
pub use config::Config;
pub use config::OneharnessConfig;
pub use config::Overrides;
pub use config::ProviderConfig;
pub use conversation::Message;
pub use conversation::Role;
pub use conversation::Transcript;
pub use error::Error;
pub use error::Result;
pub use eval::Comparator;
pub use eval::Eval;
pub use eval::EvalDetail;
pub use eval::EvalOutcome;
pub use eval::JudgeValue;
pub use exit::ExitCode;
pub use provider::supports_resume;
pub use provider::AssistantTurn;
pub use provider::CommandProvider;
pub use provider::JudgeKind;
pub use provider::JudgeQuery;
pub use provider::JudgeVerdict;
pub use provider::OneharnessProvider;
pub use provider::Provider;
pub use provider::SkillRef;
pub use provider::Usage;
pub use provider::UserTurn;
pub use report::CaseRun;
pub use report::Report;
pub use report::Summary;
pub use report::ValidationFinding;
pub use report::ValidationReport;
pub use runner::Runner;
pub use skill::load_skill;
pub use skill::validate_path;
pub use skill::validate_skill;
pub use skill::Finding;
pub use skill::SkillDefinition;
pub use testcase::discover_cases;
pub use testcase::SimulatedUser;
pub use testcase::TestCase;

Modules§

config
Configuration: which provider runs skills, the default platforms and models a run fans out across, and the model used for natural-language evals.
conversation
The conversation model: the transcript that flows between the runner and the provider, and is ultimately handed to evals.
error
Error type for the core library. The mapping from these errors to process exit codes lives in the CLI (see exit.rs for the documented codes).
eval
Natural-language evaluations. An eval poses a criterion in plain English and asks the provider’s judge to score the transcript: a boolean assertion, or a numeric score compared against a threshold.
exit
Documented process exit codes. Defined in the core so they are part of the library’s contract; the CLI maps crate::Error onto them.
provider
The provider boundary. skilltest never talks to a model directly; a Provider runs the skill, plays the simulated user, and judges the transcript.
report
Run results and the JSON report. The serialized shape here is the stable contract the language SDKs parse. These types are the source of truth: their JSON Schemas (via skilltest schema, goldens in schemas/) are what the SDK contract tests compare their Pydantic/Zod models against.
runner
The runner: orchestrates a test case into a conversation, drives the provider across turns, scores the transcript with evals, and fans out over the configured platform × model matrix.
skill
Skill definitions: a directory containing a SKILL.md with YAML frontmatter and a Markdown body. This module loads them and validates them, powering the skilltest validate subcommand.
testcase
Test cases: the YAML a user writes to describe one test of a skill — the initial data to hand the skill, an optional simulated user for multi-turn runs, and the evals that decide pass/fail.