Skip to main content

Crate atomr_agents_eval

Crate atomr_agents_eval 

Source
Expand description

Eval suites + replay-based regression detection.

Structs§

AnnotationItem
EvalCase
EvalResult
EvalRun
EvalSuite
InMemoryAnnotationQueue
LlmJudgeScorer
Single-criterion graded scorer — “did the actual output answer the expected question correctly?”. The judge replies pass / fail followed by a short justification.
PairwiseScorer
RegressionGate
Compare a current EvalRun against a baseline. Blocks publication if pass-rate regressed by more than tolerance.
RegressionResult
RubricCriterion
RubricScorer
ScorerOutcome

Enums§

PairwiseChoice
Verdict

Traits§

AnnotationQueue
AsyncScorer
Async-friendly scorer for impls that genuinely await — LLM judges, retrieval-grounded checks, anything network-bound. The blanket impl below promotes every sync Scorer into an AsyncScorer, so callers who hold Arc<dyn AsyncScorer> can accept both transparently.
JudgeModel
Scorer
Sync scorer — pure-CPU comparators (substring match, JSON shape, regex, etc.). Most scorers should implement this.