Skip to main content

Module rerank

Module rerank 

Source
Expand description

Stage-2 cross-encoder reranking, gated on stage-1 ambiguity.

The bi-encoder (stage 1, crate::rank) embeds query and skill description independently; its cosine scores pile into a muddy ~0.60 band where genuine matches and noise overlap, and it is confidently wrong on confusable pairs (canvas-design vs algorithmic-art, docx vs pdf). A cross-encoder reads the (prompt, skill) pair jointly and separates them: real matches score high, noise crashes well negative.

It is far costlier than the bi-encoder (a second ONNX model load + inference on the hot path), so is_ambiguous gates it: a confident lone winner, or a prompt with nothing relevant, skips stage 2 entirely and pays nothing. Only the murky middle reaches the reranker.

Feature-gated: without fastembed, rerank returns None and the caller keeps the stage-1 result — identical behaviour to before this stage existed.

Rejected experiment — mean-centering the bi-encoder space. The classic anisotropy fix (subtract the corpus-mean embedding from the query and every skill vector before cosine, then renormalize) was implemented and measured against examples/eval across all three fixtures. It did sharpen stage 1 — stage-1 top-1 rose (e.g. 75% -> 84% on the anthropic set) and recall@rerank_top_k went 98% -> 100% (it recovered the one true retrieval miss) — but the final, post-rerank recall regressed ~3 points (93/106 -> 91/106) at equal false-inject, across a min_similarity sweep. The reason is the finding examples/eval’s recall@k instrumentation made explicit: retrieval is not the bottleneck (gold is almost always already in the top-k), so a sharper bi-encoder is largely redundant with this reranker, while the shifted cosine distribution disrupts the gate it feeds. Not worth the added complexity, the new persisted mean, and the forced reindex. Revisit only if the reranker is removed or the live distribution proves materially different from the eval corpus.

Functions§

confident_winner
Whether stage-1’s top match is a confident lone dense winner: high absolute cosine and a clear gap to the runner-up. This is the one case the bi-encoder is trusted outright — it skips both the reranker and the lexical fast-path, so neither can override a strong dense match.
is_ambiguous
Whether stage-1 results warrant the cross-encoder. Skip (return false) when:
passes
Apply the reranker-scale guardrails to a reranked candidate list: keep hits at or above rerank_min and within rerank_margin of the best reranked score. Returns hits sorted by descending reranked score (input order is preserved as it already is). The caller still applies deny/session/cap.
rerank
Rerank the top-cfg.rerank_top_k stage-1 candidates with the cross-encoder, returning them rescored on the reranker’s (logit) scale and sorted descending. Some only with the fastembed feature and a usable model; None otherwise, so the caller falls back to the stage-1 ordering.