next-plaid

A CPU-based Rust implementation of the PLAID algorithm for efficient multi-vector search (late interaction retrieval).

Overview

next-plaid is a pure Rust, CPU-optimized implementation of FastPlaid. It provides the same functionality for multi-vector search using the PLAID algorithm, but runs entirely on CPU using ndarray instead of GPU tensors.

Key Features

Pure Rust: No Python or GPU dependencies required
CPU Optimized: Uses ndarray with rayon for parallel processing
BLAS Acceleration: Optional Accelerate (macOS) or OpenBLAS backends for faster matrix operations
Memory Efficient: Significantly lower memory usage compared to GPU-based solutions
K-means Integration: Uses fastkmeans-rs for centroid computation
Metadata Filtering: Optional SQLite-based metadata storage for filtered search

Installation

Add to your Cargo.toml:

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid" }

For NPY file support (required for index persistence):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy"] }

BLAS Acceleration (Recommended)

For optimal performance, enable BLAS acceleration:

macOS (Apple Accelerate framework):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "accelerate"] }

Linux (OpenBLAS):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "openblas"] }

Note: OpenBLAS requires the system library to be installed (apt install libopenblas-dev on Ubuntu).

Metadata Filtering (Optional)

For SQLite-based metadata filtering:

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "filtering"] }

Quick Start

Creating an Index

use next_plaid::{Index, IndexConfig};
use ndarray::Array2;

// Your document embeddings (list of [num_tokens, dim] arrays)
let embeddings: Vec<Array2<f32>> = load_embeddings();

// Create index with automatic centroid computation
let config = IndexConfig::default();  // nbits=4, kmeans_niters=4, etc.
let index = Index::create_with_kmeans(&embeddings, "path/to/index", &config)?;

Searching

use next_plaid::{Index, SearchParameters};

// Load the index
let index = Index::load("path/to/index")?;

// Search parameters
let params = SearchParameters {
    batch_size: 128,
    n_full_scores: 1024,
    top_k: 10,
    n_ivf_probe: 32,
};

// Single query
let query: Array2<f32> = get_query_embeddings();
let result = index.search(&query, &params, None)?;

println!("Top results: {:?}", result.passage_ids);
println!("Scores: {:?}", result.scores);

// Batch search
let queries: Vec<Array2<f32>> = get_multiple_queries();
let results = index.search_batch(&queries, &params, true, None)?;

Update or Create Index

For convenience, use update_or_create to automatically create a new index if it doesn't exist, or update an existing one:

use next_plaid::{Index, IndexConfig, UpdateConfig};

let embeddings: Vec<Array2<f32>> = load_embeddings();
let index_config = IndexConfig::default();
let update_config = UpdateConfig::default();

// Creates index if it doesn't exist, otherwise updates it
let index = Index::update_or_create(
    &embeddings,
    "path/to/index",
    &index_config,
    &update_config,
)?;

Filtered Search with Metadata

The filtering feature provides SQLite-based metadata storage for efficient filtered search:

use next_plaid::{Index, IndexConfig, SearchParameters, filtering};
use serde_json::json;

// Create index
let embeddings: Vec<Array2<f32>> = load_embeddings();
let config = IndexConfig::default();
let index = Index::create_with_kmeans(&embeddings, "path/to/index", &config)?;

// Create metadata database with document attributes
let metadata = vec![
    json!({"title": "Document 1", "category": "science", "year": 2023}),
    json!({"title": "Document 2", "category": "history", "year": 2022}),
    json!({"title": "Document 3", "category": "science", "year": 2024}),
];
filtering::create("path/to/index", &metadata)?;

// Query metadata to get document subset
let subset = filtering::where_condition(
    "path/to/index",
    "category = ? AND year >= ?",
    &[json!("science"), json!(2023)],
)?;
// Returns: [0, 2] (documents matching the filter)

// Search only within the filtered subset
let query: Array2<f32> = get_query_embeddings();
let params = SearchParameters::default();
let result = index.search(&query, &params, Some(&subset))?;

Configuration

IndexConfig

Parameter	Default	Description
`nbits`	4	Quantization bits (2 or 4). Lower = faster but less accurate
`batch_size`	50000	Tokens per batch during indexing
`seed`	None	Random seed for reproducibility

SearchParameters

Parameter	Default	Description
`batch_size`	128	Batch size for processing queries
`n_full_scores`	4096	Candidates for full scoring
`top_k`	10	Number of results to return
`n_ivf_probe`	8	Cluster probes per query

UpdateConfig

Parameter	Default	Description
`buffer_size`	100	Documents to accumulate before centroid expansion
`start_from_scratch`	999	Rebuild threshold for small indices
`kmeans_niters`	4	K-means iterations for centroid expansion
`max_points_per_centroid`	256	Maximum points per centroid

Algorithm

The PLAID (Passage-Level Aligned Interaction with Documents) algorithm works in two phases:

Index Creation

Compute K-means centroids on all token embeddings
For each document, assign tokens to nearest centroids (codes)
Compute and quantize residuals (difference from centroids)
Build an inverted file (IVF) mapping centroids to documents

Search

Compute query-centroid similarity scores
Probe top-k IVF cells to get candidate documents
Compute approximate scores using centroid codes
Re-rank top candidates with decompressed exact embeddings
Return top-k documents by ColBERT MaxSim score

Feature Flags

Feature	Description
`npy`	Index persistence with ndarray-npy
`filtering`	SQLite-based metadata support
`accelerate`	macOS BLAS acceleration
`openblas`	Linux OpenBLAS acceleration

License

Apache-2.0

next-plaid 0.2.0