Module eval

Expand description

Evaluation primitives — the agent’s “quality gate” compute.

Pure computation in kernel, I/O in SDK. This module provides the stateless building blocks for the generate → evaluate → retry quality gate:

build_eval_messages assembles the impartial-evaluator prompt from a goal + criteria + the agent’s output (the SDK then calls the eval LLM with it).
parse_verdict parses the LLM’s JSON response into a structured EvalResult.
verdict_output_schema is the JSON Schema for that verdict, used as the output_schema of the eval node in the crate::orchestration::workflow::gen_eval workflow template.

History (0.5.0 fold, OS-axis #6). This replaces the former EvalPipeline state machine + its public SDK class. The quality gate is now expressed on the workflow substrate: the iterative retry-with-feedback loop is driven by the SDK HarnessLoop (the kernel NodeKind::Loop re-arms a single node, so per-iteration eval cannot be a static DAG), and the declarative “loop-the-worker-then-verify-with-a-structured-verdict” shape is the gen_eval template. Both reuse these primitives, so the verdict shape stays consistent across the two paths.

Structs§

Criterion: A single evaluation criterion with optional weight and required flag.
CriterionResult: Per-criterion evaluation result.
EvalResult: The structured verdict produced by parsing the eval LLM’s JSON response.
SkillCandidate: A skill distilled from a successful run — SDK writes this to skill_dir.

Functions§

build_eval_messages: Build the impartial-evaluator messages for one attempt: a system instruction describing the scoring contract + a user message carrying the goal, criteria, and the agent’s output. The SDK calls the eval LLM with these, then feeds the response to parse_verdict.
parse_verdict: Parse an eval LLM’s JSON response into a structured EvalResult. Tolerant of markdown fences and missing fields (defaults: passed=false, score derived from passed).
verdict_output_schema: JSON Schema for the verdict an eval node must produce. Used as the output_schema of the eval node in the crate::orchestration::workflow::gen_eval template so the SDK can instruct + validate the verdict. Matches what parse_verdict reads.

Module eval

Module eval Copy item path

Structs§

Functions§

Module eval