Module evals

Available on crate feature experimental only.

Expand description

Evals. From OpenAI’s evals repo:

Evals provide a framework for evaluating large language models (LLMs) or systems built using LLMs. We offer an existing registry of evals to test different dimensions of OpenAI models and the ability to write your own custom evals for use cases you care about. You can also use your data to build private evals which represent the common LLMs patterns in your workflow without exposing any of that data publicly.

Structs§

LlmJudgeBuilder
LlmJudgeBuilderWithFn
LlmJudgeMetric: An LLM as a judge that judges an output by a given schema (and outputs the schema). The schema type uses the Judgment trait, which simply enforces a single function that checks whether it passes or not.
LlmJudgeMetricWithFn: An LLM as a judge that judges an output by a given schema (and outputs the schema). Unlike LlmJudgeMetric, this type uses a function pointer that takes the type and returns a bool instead.
LlmScoreMetric: An eval that scores an output based on some given criteria.
LlmScoreMetricBuilder
LlmScoreMetricScore: The scoring output returned by LlmScoreMetric. Must also be used as the Extractor return type when passed into LlmScoreMetric.
SemanticSimilarityMetric: A semantic similarity metric. Uses cosine similarity. In broad terms, cosine similarity can be used to measure how similar two documents are. This can be useful for things like quickly testing semantic similarity between two documents.
SemanticSimilarityMetricBuilder: A builder struct for SemanticSimilarityMetric.
SemanticSimilarityMetricScore: The scoring metric used for SemanticSimilarityMetric.

Enums§

EvalError: Evaluation errors.
EvalOutcome: The outcome of an evaluation (ie, sending an input to an LLM which then gets tested against a set of criteria). Invalid results due to things like functions returning errors should be encoded as invalid evaluation outcomes.

Traits§

Eval: A trait to encode evaluators - types that can be used to test LLM outputs against criteria. Evaluators come in all shapes and sizes, and additionally may themselves use LLMs (although there are many heuristics you can use that don’t). There are three possible states that an LLM can result in:
Judgment: A helper trait for LlmJudgeMetric. Types that implement Judgment generally have a very standard way of either passing or failing. As such, this can be enforced as a trait.

Module evals

Module evals Copy item path

Structs§

Enums§

Traits§

Module evals