Expand description
Cross-encoder reranker for top-K refinement.
§Why this module exists
ripvec’s bi-encoder retrieval (BERT or semble) embeds query and documents into a shared vector space and ranks by cosine. That’s cheap to scale, but the model can’t express cross-token interactions between query and document — each side is encoded independently. On natural-language and prose corpora this caps quality.
A cross-encoder concatenates the pair [CLS] query [SEP] doc [SEP]
and runs full attention across both, producing a single relevance
score. Quality is meaningfully higher but cost is O(candidates),
so it’s used only as a reranker on the bi-encoder’s top-K.
§Architecture
This module is a thin orchestrator: tokenize (query, doc) pairs,
delegate scoring to a RerankBackend
(currently [crate::backend::cpu::CpuRerankBackend] — same BERT
trunk as the bi-encoder, plus a Linear(hidden -> 1) classifier
head + sigmoid).
Adding GPU rerankers later is mechanical: implement
RerankBackend for Metal/CUDA/MLX, mirror load_reranker_cpu in
backend/mod.rs, route through Reranker::from_pretrained.
Structs§
- Reranker
- Cross-encoder reranker orchestrator.
Constants§
- DEFAULT_
RERANK_ CANDIDATES - Default cap on candidates passed to the reranker.
- DEFAULT_
RERANK_ MODEL - Default cross-encoder model.
cross-encoder/ms-marco-MiniLM-L-12-v2is 33MB, ~10ms per query/doc pair on CPU, NDCG@10 = 74.5 on MS MARCO dev. Picked over the smaller L-6 (22MB, NDCG 74.3) because the 4-corpus benchmark matrix showed L-12 added meaningful target-hit lift across both prose (Gutenberg) and code (Tokio) — and the ~5ms/pair extra is invisible against the indexing budget on any non-trivial corpus.