evaluate_llm

Function evaluate_llm 

Source
pub fn evaluate_llm(
    records: Vec<LLMEvalRecord>,
    metrics: Vec<LLMEvalMetric>,
    config: Option<EvaluationConfig>,
) -> Result<LLMEvalResults, EvaluationError>
Expand description

Function for evaluating LLM response and generating metrics. The primary use case for evaluate_llm is to take a list of data samples, which often contain inputs and outputs from LLM systems and evaluate them against user-defined metrics in a LLM as a judge pipeline. The user is expected provide a list of dict objects and a list of LLMEval metrics. These eval metrics will be used to create a workflow, which is then executed in an async context. All eval scores are extracted and returned to the user.

ยงArguments

  • py: The Python interpreter instance.
  • data: A list of data samples to evaluate.
  • metrics: A list of evaluation metrics to use.