Expand description
Eval suites + replay-based regression detection.
Structs§
- Annotation
Item - Eval
Case - Eval
Result - EvalRun
- Eval
Suite - InMemory
Annotation Queue - LlmJudge
Scorer - Single-criterion graded scorer — “did the actual output answer the
expected question correctly?”. The judge replies
pass/failfollowed by a short justification. - Pairwise
Scorer - Regression
Gate - Compare a current
EvalRunagainst a baseline. Blocks publication if pass-rate regressed by more thantolerance. - Regression
Result - Rubric
Criterion - Rubric
Scorer - Scorer
Outcome