Module eval

Expand description

Natural-language evaluations. An eval poses a criterion in plain English and asks the provider’s judge to score the transcript: a boolean assertion, or a numeric score compared against a threshold.

Structs§

EvalOutcome: The result of running one eval against a transcript.

Enums§

Comparator: How a numeric score is compared to its threshold.
Eval: An eval specification, as written in a test case’s YAML.
EvalDetail: The kind-specific detail of an eval outcome, for reporting.
JudgeValue: The raw value a judge returns: either a boolean or a number, matching the eval kind. Deserialized untagged from the provider’s value field.

Module eval

Module eval Copy item path

Structs§

Enums§

Module eval