Expand description
skilltest-core — the library that powers the skilltest CLI and, through
it, the language SDKs and test-framework packages.
The flow is: load a Config and one or more TestCases, build a
Provider (the boundary to oneharness / a model), and hand both to a
Runner, which drives each case into a conversation, scores the transcript
with natural-language Evals, and returns a Report. The report’s JSON
form is the stable contract the language SDKs consume.
Everything that crosses a trust boundary — config files, test-case YAML, skill frontmatter, and every provider response — is parsed into a typed model before use.
Re-exports§
pub use config::CommandConfig;pub use config::Config;pub use config::OneharnessConfig;pub use config::Overrides;pub use config::ProviderConfig;pub use conversation::Message;pub use conversation::Role;pub use conversation::Transcript;pub use error::Error;pub use error::Result;pub use eval::Comparator;pub use eval::Eval;pub use eval::EvalDetail;pub use eval::EvalOutcome;pub use eval::JudgeValue;pub use exit::ExitCode;pub use provider::supports_resume;pub use provider::AssistantTurn;pub use provider::CommandProvider;pub use provider::JudgeKind;pub use provider::JudgeQuery;pub use provider::JudgeVerdict;pub use provider::OneharnessProvider;pub use provider::Provider;pub use provider::SkillRef;pub use provider::Usage;pub use provider::UserTurn;pub use report::CaseRun;pub use report::Report;pub use report::Summary;pub use report::ValidationFinding;pub use report::ValidationReport;pub use runner::Runner;pub use skill::load_skill;pub use skill::validate_path;pub use skill::validate_skill;pub use skill::Finding;pub use skill::SkillDefinition;pub use testcase::discover_cases;pub use testcase::SimulatedUser;pub use testcase::TestCase;
Modules§
- config
- Configuration: which provider runs skills, the default platforms and models a run fans out across, and the model used for natural-language evals.
- conversation
- The conversation model: the transcript that flows between the runner and the provider, and is ultimately handed to evals.
- error
- Error type for the core library. The mapping from these errors to process
exit codes lives in the CLI (see
exit.rsfor the documented codes). - eval
- Natural-language evaluations. An eval poses a criterion in plain English and asks the provider’s judge to score the transcript: a boolean assertion, or a numeric score compared against a threshold.
- exit
- Documented process exit codes. Defined in the core so they are part of the
library’s contract; the CLI maps
crate::Erroronto them. - provider
- The provider boundary.
skilltestnever talks to a model directly; aProviderruns the skill, plays the simulated user, and judges the transcript. - report
- Run results and the JSON report. The serialized shape here is the stable
contract the language SDKs parse. These types are the source of truth:
their JSON Schemas (via
skilltest schema, goldens inschemas/) are what the SDK contract tests compare their Pydantic/Zod models against. - runner
- The runner: orchestrates a test case into a conversation, drives the provider across turns, scores the transcript with evals, and fans out over the configured platform × model matrix.
- skill
- Skill definitions: a directory containing a
SKILL.mdwith YAML frontmatter and a Markdown body. This module loads them and validates them, powering theskilltest validatesubcommand. - testcase
- Test cases: the YAML a user writes to describe one test of a skill — the initial data to hand the skill, an optional simulated user for multi-turn runs, and the evals that decide pass/fail.