Module colbert

Expand description

ColBERT-style Multi-Vector Search

Late interaction model for dense retrieval with token-level matching.

§Algorithm Overview

ColBERT (Contextualized Late Interaction over BERT) represents documents as collections of token embeddings and uses MaxSim for scoring:

Each document/query → sequence of token embeddings
Score = Σ max(sim(q_token, d_token)) for all query tokens
“Late interaction”: token-level matching instead of single vector

§Benefits

Fine-grained matching: Matches specific parts of documents
Better accuracy: Captures more semantic nuance than single vectors
Interpretability: Can identify which tokens matched

§Example

use oxify_vector::colbert::{ColbertIndex, ColbertConfig};
use std::collections::HashMap;

let config = ColbertConfig::default();
let mut index = ColbertIndex::new(config);

// Each document has multiple token embeddings
let mut doc_tokens = HashMap::new();
doc_tokens.insert("doc1".to_string(), vec![
    vec![0.1, 0.2, 0.3],
    vec![0.2, 0.3, 0.4],
    vec![0.3, 0.4, 0.5],
]);

index.build(&doc_tokens)?;

let query_tokens = vec![
    vec![0.15, 0.25, 0.35],
    vec![0.25, 0.35, 0.45],
];

let results = index.search(&query_tokens, 10)?;

Structs§

ColbertConfig: ColBERT configuration
ColbertIndex: ColBERT index for multi-vector search
ColbertSearchResult: ColBERT search result with token-level match information
ColbertStats: ColBERT index statistics
MultiVectorDoc: Multi-vector representation of a document

Module colbert

Module colbert Copy item path

§Algorithm Overview

§Benefits

§Example

Structs§

Module colbert