Module eval

Expand description

The agent stage: hand mined tasks to an external runner, collect attempts.

lean-agent itself never calls a model. Instead it talks to a user-supplied runner over a line-oriented process contract: one task JSON per line goes to the runner’s stdin, and one attempt JSON per line is read back from its stdout. The runner owns prompt construction and any model calls; this crate owns the task model, the pairing, and turning each reply into a replayable Attempt.

§Process contract

The runner is spawned once and read in lock step: a task is written and flushed, then exactly one reply line is read before the next task is sent. A well-behaved runner emits one line per task and flushes after each.
Blank lines from the runner are ignored, so a chatty runner does not desynchronize the exchange.
The lake root is passed to the runner in the LEAN_AGENT_LAKE_ROOT environment variable so the runner can read the project if it needs to.

Each reply is {task_id, attempt_id, replacement, model?, prompt_hash?, metadata?}. The reply carries only the proof text; the editable span, target file, and backing diagnostic come from the mined task, so the merged Attempt is everything replay needs.

Structs§

EvalOptions: Runtime options for an eval run.
EvalSummary: Counts from an eval run.
RunnerResponse: One line the runner writes back for one task.

Constants§

LAKE_ROOT_ENV: Environment variable carrying the lake root through to the runner.

Functions§

run_eval: Stream every task to the runner and write one attempt per reply.

Module eval

Module eval Copy item path

§Process contract

Structs§

Constants§

Functions§

Module eval