Skip to main content

Module evaluator

Module evaluator 

Source
Expand description

LLM-as-judge evaluator for benchmark datasets.

Evaluator runs each benchmark case against a subject model, then scores the responses in parallel using a separate judge model. Token budget enforcement and concurrency limits are applied per Evaluator::evaluate invocation.

Structsยง

CaseScore
Score for a single benchmark case.
EvalReport
Aggregate evaluation report returned by Evaluator::evaluate.
Evaluator
Evaluates a subject model against a benchmark dataset using an LLM judge.
JudgeOutput
Structured output returned by the judge LLM.