Module reliability

Expand description

Reliability: does a program parse/run without ambiguity, and when it fails, is the failure actionable?

Two things drive an agent’s retry-token blowup. First, ambiguity: a form the model mis-emits and has to redo. Second, dead-end errors: prose like "error near line 3" tells the model nothing to branch on, so it guesses. This module aggregates the outcomes of running a program over a set of representative invocations into a pass rate and an actionable-failure rate (failures that carry a structured, machine-branchable code + hint).

Structs§

ErrorQuality: A graded assessment of how actionable a failure is — refining the binary structured_error of Outcome. Each component an agent can use to self-correct contributes equally to the score.
ErrorQualityReport: Aggregate ErrorQuality over a set of failures.
Outcome: The outcome of one invocation, as classified by the caller.
ReliabilityReport: Aggregate reliability over a set of invocations.

Functions§

assess_error_quality: Grade each failure with grade and aggregate. Pass only the failing cases (the successes carry no error to grade); an empty set scores 1.0 vacuously.
assess_reliability: Assess reliability by running run over each case and aggregating outcomes.