Crate atomr_agents_eval

Expand description

Eval suites + replay-based regression detection.

Structs§

AnnotationItem
EvalCase
EvalResult
EvalRun
EvalSuite
InMemoryAnnotationQueue
LlmJudgeScorer: Single-criterion graded scorer — “did the actual output answer the expected question correctly?”. The judge replies pass / fail followed by a short justification.
PairwiseScorer
RegressionGate: Compare a current EvalRun against a baseline. Blocks publication if pass-rate regressed by more than tolerance.
RegressionResult
RubricCriterion
RubricScorer
ScorerOutcome

AnnotationQueue
AsyncScorer: Async-friendly scorer for impls that genuinely await — LLM judges, retrieval-grounded checks, anything network-bound. The blanket impl below promotes every sync Scorer into an AsyncScorer, so callers who hold Arc<dyn AsyncScorer> can accept both transparently.
JudgeModel
Scorer: Sync scorer — pure-CPU comparators (substring match, JSON shape, regex, etc.). Most scorers should implement this.