CPU-based PLAID implementation for multi-vector search using ndarray.
Overview
next-plaid is a pure Rust library implementing the PLAID (Performance-optimized Late Interaction using Approximate nearest neighbor Indexing for Dense retrieval) algorithm. It enables efficient ColBERT-style late interaction retrieval with:
- Memory-mapped index files for low RAM usage
- Product quantization with configurable bit-width (2-bit or 4-bit)
- IVF (Inverted File) for coarse-grained candidate filtering
- ColBERT MaxSim scoring for late interaction ranking
- SQLite-based metadata filtering
- Incremental index updates and document deletion
Installation
Cargo.toml
Add to Cargo.toml:
[]
= "0.4"
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
default |
Pure Rust, no BLAS | None |
accelerate |
Apple Accelerate BLAS (macOS only) | accelerate-src |
openblas |
OpenBLAS (Linux/cross-platform) | openblas-src (system OpenBLAS required) |
With Apple Accelerate (macOS):
[]
= { = "0.4", = ["accelerate"] }
With OpenBLAS (Linux):
[]
= { = "0.4", = ["openblas"] }
Requires system OpenBLAS:
# Ubuntu/Debian
# Fedora
# Arch Linux
Public API
Re-exports from lib.rs
pub use ResidualCodec;
pub use delete_from_index;
pub use ;
pub use MmapIndex;
pub use ;
pub use ;
pub use ;
pub use UpdateConfig;
Public Modules
codec- Residual quantization codecdelete- Document deletionembeddings- Embedding reconstructionerror- Error typesfiltering- SQLite metadata filteringindex- Index creation and MmapIndexkmeans- K-means clusteringmmap- Memory-mapped array typessearch- Search functionalityupdate- Incremental updatesutils- Utility functions
Core Types
MmapIndex
Memory-mapped PLAID index. Primary interface for search operations.
Methods
// Load existing index
IndexConfig
Configuration for index creation.
Default:
IndexConfig
SearchParameters
Search configuration.
Default:
SearchParameters
UpdateConfig
Configuration for index updates.
QueryResult / SearchResult
Search result container.
pub type SearchResult = QueryResult;
Metadata
Index metadata (persisted in metadata.json).
ResidualCodec
Quantization codec for compression/decompression.
Methods
Error
Error types.
pub type Result<T> = Result;
Filtering Module
SQLite-based document metadata filtering.
Functions
// Check if metadata database exists
K-means Module
Centroid computation functions.
// Compute centroids from flat embeddings
Standalone Functions
Index Creation
// Create index files with pre-computed centroids
Deletion
// Delete documents from index
Index File Structure
index_directory/
metadata.json # Index metadata
centroids.npy # Centroid embeddings [K, dim]
bucket_cutoffs.npy # Quantization boundaries
bucket_weights.npy # Reconstruction values
avg_residual.npy # Average residual per dimension
cluster_threshold.npy # Outlier detection threshold
ivf.npy # Inverted file (doc IDs per centroid)
ivf_lengths.npy # Length of each IVF posting list
plan.json # Indexing plan
merged_codes.npy # Memory-mapped codes (auto-generated)
merged_residuals.npy # Memory-mapped residuals (auto-generated)
metadata.db # SQLite metadata (optional)
# Per-chunk files:
0.codes.npy # Centroid assignments for chunk 0
0.residuals.npy # Quantized residuals for chunk 0
0.metadata.json # Chunk metadata
doclens.0.json # Document lengths for chunk 0
Usage Examples
Create Index
use ;
use Array2;
// Document embeddings: Vec of [num_tokens, dim] arrays
let embeddings: = load_embeddings;
let index_config = IndexConfig ;
let update_config = default;
// Creates if doesn't exist, updates otherwise
let = update_or_create?;
Load and Search
use ;
use Array2;
let index = load?;
// Query embedding: [num_tokens, dim]
let query: = encode_query;
let params = SearchParameters ;
let results = index.search?;
for in results.passage_ids.iter.zip
Search with Filtering
use ;
use json;
let index = load?;
// Get document IDs matching filter
let subset = where_condition?;
// Search within subset
let results = index.search?;
Incremental Update
use ;
let mut index = load?;
let new_embeddings: = load_new_documents;
let config = default;
// Returns assigned document IDs
let doc_ids = index.update?;
Update with Metadata
use ;
use json;
let mut index = load?;
let new_embeddings: = load_new_documents;
let metadata = vec!;
let config = default;
let doc_ids = index.update_with_metadata?;
Delete Documents
use MmapIndex;
let mut index = load?;
let docs_to_delete = vec!;
let deleted_count = index.delete?;
Reconstruct Embeddings
use MmapIndex;
let index = load?;
// Reconstruct multiple documents
let embeddings = index.reconstruct?;
// Reconstruct single document
let doc_emb = index.reconstruct_single?;
Update or Create
use ;
let embeddings: = load_embeddings;
let index_config = default;
let update_config = default;
// Creates if doesn't exist, updates otherwise
let = update_or_create?;
Update Behavior
The update system has three modes based on index size:
-
Start-from-scratch (
num_documents <= start_from_scratch, default 999):- Loads existing embeddings from
embeddings.npy - Combines with new embeddings
- Rebuilds entire index with fresh K-means
- Loads existing embeddings from
-
Buffer mode (
total_new < buffer_size, default 100):- Adds documents without centroid expansion
- Saves to buffer for later expansion
-
Centroid expansion (
total_new >= buffer_size):- Deletes previously buffered documents
- Finds outliers beyond
cluster_threshold - Expands centroids via K-means on outliers
- Re-indexes all buffered + new documents
Search Algorithm
- IVF Probing: Compute query-centroid scores, select top
n_ivf_probecentroids per query token - Candidate Retrieval: Get document IDs from selected IVF posting lists
- Approximate Scoring: Score candidates using centroid approximation (MaxSim with centroids)
- Re-ranking: Take top
n_full_scorescandidates - Exact Scoring: Decompress embeddings, compute exact ColBERT MaxSim
- Return: Top
top_kresults with scores
Dependencies
= "0.16" # N-dimensional arrays
= "1.10" # Parallelism
= "1.0" # Serialization
= "1.0" # JSON
= "2.0" # Error handling
= "0.9" # NPY file format
= "0.1" # K-means clustering
= "0.9" # Memory mapping
= "2.4" # Float16 support
= "0.38" # SQLite
= "1.11" # Regex for filtering
License
Apache-2.0