Skip to main content

Module eval

Module eval 

Source
Expand description

Evaluation primitives — the agent’s “quality gate” compute.

Pure computation in kernel, I/O in SDK. This module provides the stateless building blocks for the generate → evaluate → retry quality gate:

History (0.5.0 fold, OS-axis #6). This replaces the former EvalPipeline state machine + its public SDK class. The quality gate is now expressed on the workflow substrate: the iterative retry-with-feedback loop is driven by the SDK HarnessLoop (the kernel NodeKind::Loop re-arms a single node, so per-iteration eval cannot be a static DAG), and the declarative “loop-the-worker-then-verify-with-a-structured-verdict” shape is the gen_eval template. Both reuse these primitives, so the verdict shape stays consistent across the two paths.

Structs§

Criterion
A single evaluation criterion with optional weight and required flag.
CriterionResult
Per-criterion evaluation result.
EvalResult
The structured verdict produced by parsing the eval LLM’s JSON response.
SkillCandidate
A skill distilled from a successful run — SDK writes this to skill_dir.

Functions§

build_eval_messages
Build the impartial-evaluator messages for one attempt: a system instruction describing the scoring contract + a user message carrying the goal, criteria, and the agent’s output. The SDK calls the eval LLM with these, then feeds the response to parse_verdict.
parse_verdict
Parse an eval LLM’s JSON response into a structured EvalResult. Tolerant of markdown fences and missing fields (defaults: passed=false, score derived from passed).
verdict_output_schema
JSON Schema for the verdict an eval node must produce. Used as the output_schema of the eval node in the crate::orchestration::workflow::gen_eval template so the SDK can instruct + validate the verdict. Matches what parse_verdict reads.