Expand description
Cross-encoder reranker for top-K refinement.
§Why this module exists
ripvec’s bi-encoder retrieval (BERT or semble) embeds query and documents into a shared vector space and ranks by cosine. That’s cheap to scale, but the model can’t express cross-token interactions between query and document — each side is encoded independently. On natural-language and prose corpora this caps quality.
A cross-encoder concatenates the pair [CLS] query [SEP] doc [SEP]
and runs full attention across both, producing a single relevance
score. Quality is meaningfully higher but cost is O(candidates),
so it’s used only as a reranker on the bi-encoder’s top-K.
§Architecture
This module is a thin orchestrator: tokenize (query, doc) pairs,
delegate scoring to a RerankBackend
(currently [crate::backend::cpu::CpuRerankBackend] — same BERT
trunk as the bi-encoder, plus a Linear(hidden -> 1) classifier
head + sigmoid).
Only the CPU rerank backend is wired today. Adding GPU rerankers
later would require implementing RerankBackend for the target
device, mirroring load_reranker_cpu in backend/mod.rs, and
routing through Reranker::from_pretrained.
Structs§
- Reranker
- Cross-encoder reranker orchestrator.
Constants§
- DEFAULT_
RERANK_ CANDIDATES - Default cap on candidates passed to the reranker.
- DEFAULT_
RERANK_ MODEL - Default cross-encoder model.
cross-encoder/ms-marco-TinyBERT-L-2-v2(~5 MB, 2-layer distilled-from-BERT-base) replaced the prior MiniLM-L-12-v2 default after a model sweep on the gutenberg prose benchmark (15 NL queries) showed it bit-identical on NDCG@10 / recall@10 while running 20x faster at the warm-query path: