Skip to main content

Module multivector

Module multivector 

Source
Expand description

Multi-vector retrieval with WARP algorithm

This module provides ColBERT-style multi-vector retrieval using the WARP (Weighted Approximate Residual Product) algorithm. Unlike single-vector dense retrieval, multi-vector approaches represent each document and query as multiple token embeddings, enabling fine-grained “late interaction” scoring.

§Overview

The WARP algorithm provides memory-efficient multi-vector search by:

  1. Residual Quantization - Compress token embeddings from 32-bit floats to 2-4 bits per dimension using centroid-based encoding
  2. IVF Indexing - Organize embeddings by centroid for cache-efficient access
  3. Deferred Decompression - Score directly from compressed representations

§Key Components

§Quick Start

use aprender_rag::multivector::{
    WarpIndex, WarpIndexConfig, WarpSearchConfig,
    MockMultiVectorEmbedder, MultiVectorEmbedder,
    MultiVectorRetriever,
};

// Create retriever with mock embedder
let config = WarpIndexConfig::new(2, 256, 128);
let embedder = MockMultiVectorEmbedder::new(128, 512);
let mut retriever = MultiVectorRetriever::new(config, embedder);

// Train on sample documents
retriever.train(&sample_chunks)?;

// Index documents
for chunk in chunks {
    retriever.index(chunk)?;
}
retriever.build()?;

// Search
let results = retriever.retrieve("What is machine learning?", 10)?;

§Theory: MaxSim Scoring

ColBERT uses MaxSim scoring which computes, for query Q with tokens {q₁…qₘ} and document D with tokens {d₁…dₙ}:

MaxSim(Q, D) = Σᵢ maxⱼ(qᵢ · dⱼ)

For each query token, find the maximum similarity with any document token, then sum across query tokens. This captures soft alignment without explicit matching.

§Feature Flag

This module is only available with the multivector feature:

[dependencies]
trueno-rag = { version = "0.1", features = ["multivector"] }

§References

  • Khattab & Zaharia (2020). “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.” SIGIR 2020.
  • Santhanam et al. (2022). “ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction.” NAACL 2022.

Re-exports§

pub use codec::ResidualCodec;
pub use codec::ResidualCodecBuilder;
pub use embedder::MockMultiVectorEmbedder;
pub use embedder::MultiVectorEmbedder;
pub use index::WarpIndex;
pub use search::exact_maxsim;
pub use search::CandidateScorer;
pub use search::CentroidSelector;
pub use search::ScoreMerger;
pub use types::MultiVectorEmbedding;
pub use types::WarpIndexConfig;
pub use types::WarpSearchConfig;

Modules§

codec
Residual quantization codec for WARP
embedder
Multi-vector embedder trait and implementations
index
WARP index with IVF structure
search
WARP search algorithm components
types
Core data structures for multi-vector retrieval