Skip to main content

Crate atomr_agents_eval

Crate atomr_agents_eval 

Source
Expand description

Eval suites + replay-based regression detection.

Structs§

AnnotationItem
EvalCase
EvalResult
EvalRun
EvalSuite
InMemoryAnnotationQueue
LlmJudgeScorer
Single-criterion graded scorer — “did the actual output answer the expected question correctly?”. The judge replies pass / fail followed by a short justification.
PairwiseScorer
RegressionGate
Compare a current EvalRun against a baseline. Blocks publication if pass-rate regressed by more than tolerance.
RegressionResult
RubricCriterion
RubricScorer
ScorerOutcome

Enums§

PairwiseChoice
Verdict

Traits§

AnnotationQueue
JudgeModel
Scorer