Skip to main content

Module eval

Module eval 

Source
Expand description

The agent stage: hand mined tasks to an external runner, collect attempts.

lean-agent itself never calls a model. Instead it talks to a user-supplied runner over a line-oriented process contract: one task JSON per line goes to the runner’s stdin, and one attempt JSON per line is read back from its stdout. The runner owns prompt construction and any model calls; this crate owns the task model, the pairing, and turning each reply into a replayable Attempt.

§Process contract

  • The runner is spawned once and read in lock step: a task is written and flushed, then exactly one reply line is read before the next task is sent. A well-behaved runner emits one line per task and flushes after each.
  • Blank lines from the runner are ignored, so a chatty runner does not desynchronize the exchange.
  • The lake root is passed to the runner in the LEAN_AGENT_LAKE_ROOT environment variable so the runner can read the project if it needs to.

Each reply is {task_id, attempt_id, replacement, model?, prompt_hash?, metadata?}. The reply carries only the proof text; the editable span, target file, and backing diagnostic come from the mined task, so the merged Attempt is everything replay needs.

Structs§

EvalOptions
Runtime options for an eval run.
EvalSummary
Counts from an eval run.
RunnerResponse
One line the runner writes back for one task.

Constants§

LAKE_ROOT_ENV
Environment variable carrying the lake root through to the runner.

Functions§

run_eval
Stream every task to the runner and write one attempt per reply.