Function evaluate_llm

Source

pub fn evaluate_llm(
    records: Vec<LLMEvalRecord>,
    metrics: Vec<LLMEvalMetric>,
    config: Option<EvaluationConfig>,
) -> Result<LLMEvalResults, EvaluationError>

Expand description

Function for evaluating LLM response and generating metrics. The primary use case for evaluate_llm is to take a list of data samples, which often contain inputs and outputs from LLM systems and evaluate them against user-defined metrics in a LLM as a judge pipeline. The user is expected provide a list of dict objects and a list of LLMEval metrics. These eval metrics will be used to create a workflow, which is then executed in an async context. All eval scores are extracted and returned to the user.

§Arguments

py: The Python interpreter instance.
data: A list of data samples to evaluate.
metrics: A list of evaluation metrics to use.

evaluate_llm

Function evaluate_llm Copy item path

§Arguments

Function evaluate_llm