pub fn evaluate_llm(
records: Vec<LLMEvalRecord>,
metrics: Vec<LLMEvalMetric>,
config: Option<EvaluationConfig>,
) -> Result<LLMEvalResults, EvaluationError>Expand description
Function for evaluating LLM response and generating metrics. The primary use case for evaluate_llm is to take a list of data samples, which often contain inputs and outputs from LLM systems and evaluate them against user-defined metrics in a LLM as a judge pipeline. The user is expected provide a list of dict objects and a list of LLMEval metrics. These eval metrics will be used to create a workflow, which is then executed in an async context. All eval scores are extracted and returned to the user.
ยงArguments
py: The Python interpreter instance.data: A list of data samples to evaluate.metrics: A list of evaluation metrics to use.