Expand description
Configuration: which provider runs skills, the default platforms and models a run fans out across, and the model used for natural-language evals.
Config is loaded from a YAML file (default skilltest.yaml) and then refined
by CLI overrides (see Config::apply_overrides).
Structs§
- ApiJudge
Config - Settings for judging evals and the simulated user with a direct model API call instead of running them through a harness. This trades the harness’s auth-portability for a single fast HTTP round trip per judge call (no agent-loop cold start), with normalized token usage surfaced into the report.
- Command
Config - Settings for a custom provider command speaking the JSON-lines protocol (see
docs/protocol.md). Used by the bundledskilltest-fake-providerand any provider you write yourself. - Config
- The full configuration for a run.
- Oneharness
Config - Settings for the default
oneharnessprovider, which runs each prompt on a harness viaoneharness run. - Overrides
- CLI-supplied overrides.
None/empty fields leave the config value in place.
Enums§
- ApiVendor
- Which model vendor’s API the direct-API judge talks to.
- Judge
Config - How evals and the simulated user are judged, independent of the provider that
runs the skill. Absent (the default) means the run’s provider judges too
(e.g. the oneharness
judge_harness). - Provider
Config - Which provider backs a run.