Skip to main content

Module rerank

Module rerank 

Source
Expand description

Cross-encoder reranker for top-K refinement.

§Why this module exists

ripvec’s bi-encoder retrieval (BERT or semble) embeds query and documents into a shared vector space and ranks by cosine. That’s cheap to scale, but the model can’t express cross-token interactions between query and document — each side is encoded independently. On natural-language and prose corpora this caps quality.

A cross-encoder concatenates the pair [CLS] query [SEP] doc [SEP] and runs full attention across both, producing a single relevance score. Quality is meaningfully higher but cost is O(candidates), so it’s used only as a reranker on the bi-encoder’s top-K.

§Architecture

This module is a thin orchestrator: tokenize (query, doc) pairs, delegate scoring to a RerankBackend (currently [crate::backend::cpu::CpuRerankBackend] — same BERT trunk as the bi-encoder, plus a Linear(hidden -> 1) classifier head + sigmoid).

Only the CPU rerank backend is wired today. Adding GPU rerankers later would require implementing RerankBackend for the target device, mirroring load_reranker_cpu in backend/mod.rs, and routing through Reranker::from_pretrained.

Structs§

Reranker
Cross-encoder reranker orchestrator.

Constants§

DEFAULT_RERANK_CANDIDATES
Default cap on candidates passed to the reranker.
DEFAULT_RERANK_MODEL
Default cross-encoder model. cross-encoder/ms-marco-TinyBERT-L-2-v2 (~5 MB, 2-layer distilled-from-BERT-base) replaced the prior MiniLM-L-12-v2 default after a model sweep on the gutenberg prose benchmark (15 NL queries) showed it bit-identical on NDCG@10 / recall@10 while running 20x faster at the warm-query path: