Expand description
Stage-2 cross-encoder reranking, gated on stage-1 ambiguity.
The bi-encoder (stage 1, crate::rank) embeds query and skill description
independently; its cosine scores pile into a muddy ~0.60 band where genuine
matches and noise overlap, and it is confidently wrong on confusable pairs
(canvas-design vs algorithmic-art, docx vs pdf). A cross-encoder reads the
(prompt, skill) pair jointly and separates them: real matches score high,
noise crashes well negative.
It is far costlier than the bi-encoder (a second ONNX model load + inference
on the hot path), so is_ambiguous gates it: a confident lone winner, or a
prompt with nothing relevant, skips stage 2 entirely and pays nothing. Only
the murky middle reaches the reranker.
Feature-gated: without fastembed, rerank returns None and the caller
keeps the stage-1 result — identical behaviour to before this stage existed.
Rejected experiment — mean-centering the bi-encoder space. The classic
anisotropy fix (subtract the corpus-mean embedding from the query and every
skill vector before cosine, then renormalize) was implemented and measured
against examples/eval across all three fixtures. It did sharpen stage 1 —
stage-1 top-1 rose (e.g. 75% -> 84% on the anthropic set) and recall@rerank_top_k
went 98% -> 100% (it recovered the one true retrieval miss) — but the final,
post-rerank recall regressed ~3 points (93/106 -> 91/106) at equal false-inject,
across a min_similarity sweep. The reason is the finding examples/eval’s
recall@k instrumentation made explicit: retrieval is not the bottleneck (gold is
almost always already in the top-k), so a sharper bi-encoder is largely redundant
with this reranker, while the shifted cosine distribution disrupts the gate it
feeds. Not worth the added complexity, the new persisted mean, and the forced
reindex. Revisit only if the reranker is removed or the live distribution proves
materially different from the eval corpus.
Functions§
- confident_
winner - Whether stage-1’s top match is a confident lone dense winner: high absolute cosine and a clear gap to the runner-up. This is the one case the bi-encoder is trusted outright — it skips both the reranker and the lexical fast-path, so neither can override a strong dense match.
- is_
ambiguous - Whether stage-1 results warrant the cross-encoder. Skip (return
false) when: - passes
- Apply the reranker-scale guardrails to a reranked candidate list: keep hits at
or above
rerank_minand withinrerank_marginof the best reranked score. Returns hits sorted by descending reranked score (input order is preserved as it already is). The caller still applies deny/session/cap. - rerank
- Rerank the top-
cfg.rerank_top_kstage-1 candidates with the cross-encoder, returning them rescored on the reranker’s (logit) scale and sorted descending.Someonly with thefastembedfeature and a usable model;Noneotherwise, so the caller falls back to the stage-1 ordering.