Skip to main content

Module evaluator

zeph_experiments

Module evaluator

Expand description

LLM-as-judge evaluator for benchmark datasets.

Evaluator runs each benchmark case against a subject model, then scores the responses in parallel using a separate judge model. Token budget enforcement and concurrency limits are applied per Evaluator::evaluate invocation.

Structs§

CaseScore: Score for a single benchmark case.
EvalReport: Aggregate evaluation report returned by Evaluator::evaluate.
Evaluator: Evaluates a subject model against a benchmark dataset using an LLM judge.
JudgeOutput: Structured output returned by the judge LLM.