Expand description
Multi-Vector Document Storage (ColBERT-style)
This module supports documents with multiple embeddings per document, enabling late interaction models like ColBERT.
§Key Concepts
- Token-level embeddings: Each token gets its own embedding
- MaxSim: Relevance score = max similarity across all token pairs
- Late interaction: Similarity computed at query time, not indexing time
§Architecture
Document: "machine learning"
│
▼
┌─────────┬─────────┐
│ machine │ learning│
└────┬────┴────┬────┘
│ │
embed() embed()
│ │
▼ ▼
[0.1,…] [0.2,…]
Query: "deep learning"
MaxSim = max(sim(query, machine), sim(query, learning))§Example
use vecstore::multi_vector::{MultiVectorDoc, MultiVectorIndex, MaxSimAggregation};
let mut index = MultiVectorIndex::new(128); // 128-dim embeddings
// Add document with multiple token embeddings
let doc = MultiVectorDoc::new(
"doc1",
vec![
vec![0.1; 128], // "machine" embedding
vec![0.2; 128], // "learning" embedding
],
serde_json::json!({"title": "ML Guide"}),
);
index.add(doc)?;
// Query with MaxSim aggregation
let query_tokens = vec![vec![0.15; 128]];
let results = index.search(&query_tokens, 10)?;
println!("Found {} results", results.len());Modules§
- colbert
- ColBERT-specific utilities
Structs§
- Multi
Vector Doc - Multi-vector document
- Multi
Vector Index - Multi-vector index
- Multi
Vector Stats - Index statistics
Enums§
- Aggregation
Method - Aggregation method for multi-vector scores