Expand description
Natural-language evaluations. An eval poses a criterion in plain English and asks the provider’s judge to score the transcript: a boolean assertion, or a numeric score compared against a threshold.
Structs§
- Eval
Outcome - The result of running one eval against a transcript.
Enums§
- Comparator
- How a numeric score is compared to its threshold.
- Eval
- An eval specification, as written in a test case’s YAML.
- Eval
Detail - The kind-specific detail of an eval outcome, for reporting.
- Judge
Value - The raw value a judge returns: either a boolean or a number, matching the
eval kind. Deserialized untagged from the provider’s
valuefield.