Module rerank

Expand description

Cross-encoder reranker for top-K refinement.

§Why this module exists

ripvec’s bi-encoder retrieval (BERT or semble) embeds query and documents into a shared vector space and ranks by cosine. That’s cheap to scale, but the model can’t express cross-token interactions between query and document — each side is encoded independently. On natural-language and prose corpora this caps quality.

A cross-encoder concatenates the pair [CLS] query [SEP] doc [SEP] and runs full attention across both, producing a single relevance score. Quality is meaningfully higher but cost is O(candidates), so it’s used only as a reranker on the bi-encoder’s top-K.

§Architecture

This module is a thin orchestrator: tokenize (query, doc) pairs, delegate scoring to a RerankBackend (currently [crate::backend::cpu::CpuRerankBackend] — same BERT trunk as the bi-encoder, plus a Linear(hidden -> 1) classifier head + sigmoid).

Adding GPU rerankers later is mechanical: implement RerankBackend for Metal/CUDA/MLX, mirror load_reranker_cpu in backend/mod.rs, route through Reranker::from_pretrained.

Structs§

Reranker: Cross-encoder reranker orchestrator.

Constants§

DEFAULT_RERANK_CANDIDATES: Default cap on candidates passed to the reranker.
DEFAULT_RERANK_MODEL: Default cross-encoder model. cross-encoder/ms-marco-MiniLM-L-12-v2 is 33MB, ~10ms per query/doc pair on CPU, NDCG@10 = 74.5 on MS MARCO dev. Picked over the smaller L-6 (22MB, NDCG 74.3) because the 4-corpus benchmark matrix showed L-12 added meaningful target-hit lift across both prose (Gutenberg) and code (Tokio) — and the ~5ms/pair extra is invisible against the indexing budget on any non-trivial corpus.

Module rerank

Module rerank Copy item path

§Why this module exists

§Architecture

Structs§

Constants§

Module rerank