Expand description
Multi-vector retrieval with WARP algorithm
This module provides ColBERT-style multi-vector retrieval using the WARP (Weighted Approximate Residual Product) algorithm. Unlike single-vector dense retrieval, multi-vector approaches represent each document and query as multiple token embeddings, enabling fine-grained “late interaction” scoring.
§Overview
The WARP algorithm provides memory-efficient multi-vector search by:
- Residual Quantization - Compress token embeddings from 32-bit floats to 2-4 bits per dimension using centroid-based encoding
- IVF Indexing - Organize embeddings by centroid for cache-efficient access
- Deferred Decompression - Score directly from compressed representations
§Key Components
MultiVectorEmbedding- A document/query represented as multiple token embeddingsWarpIndex- The main index structure with train/insert/build/search methodsWarpIndexConfig- Configuration for index constructionWarpSearchConfig- Configuration for search parametersResidualCodec- Compression codec for token embeddingsMultiVectorEmbedder- Trait for token-level embedding models- [
MultiVectorRetriever] - High-level retriever combining embedder and index
§Quick Start
ⓘ
use aprender_rag::multivector::{
WarpIndex, WarpIndexConfig, WarpSearchConfig,
MockMultiVectorEmbedder, MultiVectorEmbedder,
MultiVectorRetriever,
};
// Create retriever with mock embedder
let config = WarpIndexConfig::new(2, 256, 128);
let embedder = MockMultiVectorEmbedder::new(128, 512);
let mut retriever = MultiVectorRetriever::new(config, embedder);
// Train on sample documents
retriever.train(&sample_chunks)?;
// Index documents
for chunk in chunks {
retriever.index(chunk)?;
}
retriever.build()?;
// Search
let results = retriever.retrieve("What is machine learning?", 10)?;§Theory: MaxSim Scoring
ColBERT uses MaxSim scoring which computes, for query Q with tokens {q₁…qₘ} and document D with tokens {d₁…dₙ}:
MaxSim(Q, D) = Σᵢ maxⱼ(qᵢ · dⱼ)For each query token, find the maximum similarity with any document token, then sum across query tokens. This captures soft alignment without explicit matching.
§Feature Flag
This module is only available with the multivector feature:
[dependencies]
trueno-rag = { version = "0.1", features = ["multivector"] }§References
- Khattab & Zaharia (2020). “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.” SIGIR 2020.
- Santhanam et al. (2022). “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction.” NAACL 2022.
Re-exports§
pub use codec::ResidualCodec;pub use codec::ResidualCodecBuilder;pub use embedder::MockMultiVectorEmbedder;pub use embedder::MultiVectorEmbedder;pub use index::WarpIndex;pub use search::exact_maxsim;pub use search::CandidateScorer;pub use search::CentroidSelector;pub use search::ScoreMerger;pub use types::MultiVectorEmbedding;pub use types::WarpIndexConfig;pub use types::WarpSearchConfig;