Expand description
Axis 1: final-output semantic similarity.
Two paths are supported:
- TF-IDF cosine (default, no extra deps) — smoothed sklearn-style TF-IDF over the corpus of response texts being compared. Lexical: word-level overlap weighted by token rarity. Fast, deterministic, blind to paraphrase (“yes” vs “I agree” score 0).
- Pluggable
Embedder— any backend that produces dense vectors per text. Usecompute_with_embedderand pass anEmbedderimpl. Suitable for ONNX runtimes, HF Inference API clients, OpenAI/Cohere embeddings, in-house services, or a PyO3 callback into Pythonsentence-transformers.
Both paths use the same downstream cosine + paired-CI machinery, so reports from either embedder are directly comparable.
§Coverage cross-references
What this axis catches:
- Final-text similarity drops (lexical with TF-IDF; paraphrase-
robust with a neural
Embedder).
What it does NOT catch:
- Wrong answer with similar words — TF-IDF cosine measures
token overlap; a numeric value flip (“$99 → $9”) barely moves
the cosine. The alignment module’s W_ARGS component catches
tool-arg value flips; numeric content drift surfaces on the
v2.7+
numeric_token_densityfingerprint dimension. - Empty-response regressions — empty-vs-empty scores 1.0 (vacuous match). The verbosity axis (axis 4) catches the collapse to empty.
- Tone shifts with same content — embeddings only carry semantic meaning; the Judge axis (axis 8) with a tone rubric is the right surface.
Functions§
- compute
- Compute the semantic-similarity axis using TF-IDF cosine.
- compute_
with_ embedder - Compute the semantic-similarity axis using a caller-supplied
dense
Embedder.