Expand description
Evaluation primitives — the agent’s “quality gate” compute.
Pure computation in kernel, I/O in SDK. This module provides the stateless building blocks for the generate → evaluate → retry quality gate:
build_eval_messagesassembles the impartial-evaluator prompt from a goal + criteria + the agent’s output (the SDK then calls the eval LLM with it).parse_verdictparses the LLM’s JSON response into a structuredEvalResult.verdict_output_schemais the JSON Schema for that verdict, used as theoutput_schemaof the eval node in thecrate::orchestration::workflow::gen_evalworkflow template.
History (0.5.0 fold, OS-axis #6). This replaces the former EvalPipeline state machine +
its public SDK class. The quality gate is now expressed on the workflow substrate: the iterative
retry-with-feedback loop is driven by the SDK HarnessLoop (the kernel NodeKind::Loop re-arms
a single node, so per-iteration eval cannot be a static DAG), and the declarative
“loop-the-worker-then-verify-with-a-structured-verdict” shape is the gen_eval template. Both
reuse these primitives, so the verdict shape stays consistent across the two paths.
Structs§
- Criterion
- A single evaluation criterion with optional weight and required flag.
- Criterion
Result - Per-criterion evaluation result.
- Eval
Result - The structured verdict produced by parsing the eval LLM’s JSON response.
- Skill
Candidate - A skill distilled from a successful run — SDK writes this to
skill_dir.
Functions§
- build_
eval_ messages - Build the impartial-evaluator messages for one attempt: a system instruction describing the
scoring contract + a user message carrying the goal, criteria, and the agent’s output. The SDK
calls the eval LLM with these, then feeds the response to
parse_verdict. - parse_
verdict - Parse an eval LLM’s JSON response into a structured
EvalResult. Tolerant of markdown fences and missing fields (defaults:passed=false, score derived frompassed). - verdict_
output_ schema - JSON Schema for the verdict an eval node must produce. Used as the
output_schemaof the eval node in thecrate::orchestration::workflow::gen_evaltemplate so the SDK can instruct + validate the verdict. Matches whatparse_verdictreads.