Crate atomr_agents_eval

Expand description

Eval suites + replay-based regression detection.

Structs§

AnnotationItem
EvalCase
EvalResult
EvalRun
EvalSuite
InMemoryAnnotationQueue
LlmJudgeScorer: Single-criterion graded scorer — “did the actual output answer the expected question correctly?”. The judge replies pass / fail followed by a short justification.
PairwiseScorer
RegressionGate: Compare a current EvalRun against a baseline. Blocks publication if pass-rate regressed by more than tolerance.
RegressionResult
RubricCriterion
RubricScorer
ScorerOutcome