Skip to main content

Module multi_vector

Module multi_vector 

Source
Expand description

Multi-Vector Document Storage (ColBERT-style)

This module supports documents with multiple embeddings per document, enabling late interaction models like ColBERT.

§Key Concepts

  • Token-level embeddings: Each token gets its own embedding
  • MaxSim: Relevance score = max similarity across all token pairs
  • Late interaction: Similarity computed at query time, not indexing time

§Architecture

Document: "machine learning"
    │
    ▼
┌─────────┬─────────┐
│ machine │ learning│
└────┬────┴────┬────┘
     │         │
  embed()   embed()
     │         │
     ▼         ▼
  [0.1,…]  [0.2,…]

Query: "deep learning"
  MaxSim = max(sim(query, machine), sim(query, learning))

§Example

use vecstore::multi_vector::{MultiVectorDoc, MultiVectorIndex, MaxSimAggregation};

let mut index = MultiVectorIndex::new(128); // 128-dim embeddings

// Add document with multiple token embeddings
let doc = MultiVectorDoc::new(
    "doc1",
    vec![
        vec![0.1; 128],  // "machine" embedding
        vec![0.2; 128],  // "learning" embedding
    ],
    serde_json::json!({"title": "ML Guide"}),
);

index.add(doc)?;

// Query with MaxSim aggregation
let query_tokens = vec![vec![0.15; 128]];
let results = index.search(&query_tokens, 10)?;

println!("Found {} results", results.len());

Modules§

colbert
ColBERT-specific utilities

Structs§

MultiVectorDoc
Multi-vector document
MultiVectorIndex
Multi-vector index
MultiVectorStats
Index statistics

Enums§

AggregationMethod
Aggregation method for multi-vector scores