Module evaluator

Expand description

LLM-as-judge evaluator for benchmark datasets.

Evaluator runs each benchmark case against a subject model, then scores the responses in parallel using a separate judge model. Token budget enforcement and concurrency limits are applied per Evaluator::evaluate invocation.

Structs§

CaseScore: Score for a single benchmark case produced by the judge model.
EvalReport: Aggregate evaluation report returned by Evaluator::evaluate.
Evaluator: Evaluates a subject model against a benchmark dataset using an LLM judge.
JudgeOutput: Structured output returned by the judge LLM for a single benchmark case.

Module evaluator

Module evaluator Copy item path

Structs§

Module evaluator