next-plaid
A CPU-based Rust implementation of the PLAID algorithm for efficient multi-vector search (late interaction retrieval).
Overview
next-plaid is a pure Rust, CPU-optimized implementation of FastPlaid. It provides the same functionality for multi-vector search using the PLAID algorithm, but runs entirely on CPU using ndarray instead of GPU tensors.
Key Features
- Pure Rust: No Python or GPU dependencies required
- CPU Optimized: Uses ndarray with rayon for parallel processing
- BLAS Acceleration: Optional Accelerate (macOS) or OpenBLAS backends for faster matrix operations
- Memory Efficient: Significantly lower memory usage compared to GPU-based solutions
- K-means Integration: Uses fastkmeans-rs for centroid computation
- Metadata Filtering: Optional SQLite-based metadata storage for filtered search
Installation
Add to your Cargo.toml:
[]
= { = "https://github.com/lightonai/next-plaid" }
For NPY file support (required for index persistence):
[]
= { = "https://github.com/lightonai/next-plaid", = ["npy"] }
BLAS Acceleration (Recommended)
For optimal performance, enable BLAS acceleration:
macOS (Apple Accelerate framework):
[]
= { = "https://github.com/lightonai/next-plaid", = ["npy", "accelerate"] }
Linux (OpenBLAS):
[]
= { = "https://github.com/lightonai/next-plaid", = ["npy", "openblas"] }
Note: OpenBLAS requires the system library to be installed (apt install libopenblas-dev on Ubuntu).
Metadata Filtering (Optional)
For SQLite-based metadata filtering:
[]
= { = "https://github.com/lightonai/next-plaid", = ["npy", "filtering"] }
Quick Start
Creating an Index
use ;
use Array2;
// Your document embeddings (list of [num_tokens, dim] arrays)
let embeddings: = load_embeddings;
// Create index with automatic centroid computation
let config = default; // nbits=4, kmeans_niters=4, etc.
let index = create_with_kmeans?;
Searching
use ;
// Load the index
let index = load?;
// Search parameters
let params = SearchParameters ;
// Single query
let query: = get_query_embeddings;
let result = index.search?;
println!;
println!;
// Batch search
let queries: = get_multiple_queries;
let results = index.search_batch?;
Update or Create Index
For convenience, use update_or_create to automatically create a new index if it doesn't exist, or update an existing one:
use ;
let embeddings: = load_embeddings;
let index_config = default;
let update_config = default;
// Creates index if it doesn't exist, otherwise updates it
let index = update_or_create?;
Filtered Search with Metadata
The filtering feature provides SQLite-based metadata storage for efficient filtered search:
use ;
use json;
// Create index
let embeddings: = load_embeddings;
let config = default;
let index = create_with_kmeans?;
// Create metadata database with document attributes
let metadata = vec!;
create?;
// Query metadata to get document subset
let subset = where_condition?;
// Returns: [0, 2] (documents matching the filter)
// Search only within the filtered subset
let query: = get_query_embeddings;
let params = default;
let result = index.search?;
Configuration
IndexConfig
| Parameter | Default | Description |
|---|---|---|
nbits |
4 | Quantization bits (2 or 4). Lower = faster but less accurate |
batch_size |
50000 | Tokens per batch during indexing |
seed |
None | Random seed for reproducibility |
SearchParameters
| Parameter | Default | Description |
|---|---|---|
batch_size |
128 | Batch size for processing queries |
n_full_scores |
4096 | Candidates for full scoring |
top_k |
10 | Number of results to return |
n_ivf_probe |
8 | Cluster probes per query |
UpdateConfig
| Parameter | Default | Description |
|---|---|---|
buffer_size |
100 | Documents to accumulate before centroid expansion |
start_from_scratch |
999 | Rebuild threshold for small indices |
kmeans_niters |
4 | K-means iterations for centroid expansion |
max_points_per_centroid |
256 | Maximum points per centroid |
Algorithm
The PLAID (Passage-Level Aligned Interaction with Documents) algorithm works in two phases:
Index Creation
- Compute K-means centroids on all token embeddings
- For each document, assign tokens to nearest centroids (codes)
- Compute and quantize residuals (difference from centroids)
- Build an inverted file (IVF) mapping centroids to documents
Search
- Compute query-centroid similarity scores
- Probe top-k IVF cells to get candidate documents
- Compute approximate scores using centroid codes
- Re-rank top candidates with decompressed exact embeddings
- Return top-k documents by ColBERT MaxSim score
Feature Flags
| Feature | Description |
|---|---|
npy |
Index persistence with ndarray-npy |
filtering |
SQLite-based metadata support |
accelerate |
macOS BLAS acceleration |
openblas |
Linux OpenBLAS acceleration |
License
Apache-2.0