Expand description
Eval suites + replay-based regression detection.
Structs§
- Annotation
Item - Eval
Case - Eval
Result - EvalRun
- Eval
Suite - InMemory
Annotation Queue - LlmJudge
Scorer - Single-criterion graded scorer — “did the actual output answer the
expected question correctly?”. The judge replies
pass/failfollowed by a short justification. - Pairwise
Scorer - Regression
Gate - Compare a current
EvalRunagainst a baseline. Blocks publication if pass-rate regressed by more thantolerance. - Regression
Result - Rubric
Criterion - Rubric
Scorer - Scorer
Outcome
Enums§
Traits§
- Annotation
Queue - Async
Scorer - Async-friendly scorer for impls that genuinely await — LLM judges,
retrieval-grounded checks, anything network-bound. The blanket impl
below promotes every sync
Scorerinto anAsyncScorer, so callers who holdArc<dyn AsyncScorer>can accept both transparently. - Judge
Model - Scorer
- Sync scorer — pure-CPU comparators (substring match, JSON shape, regex, etc.). Most scorers should implement this.