next-plaid 0.2.0

CPU-based PLAID implementation for multi-vector search using ndarray
Documentation

next-plaid

A CPU-based Rust implementation of the PLAID algorithm for efficient multi-vector search (late interaction retrieval).

Overview

next-plaid is a pure Rust, CPU-optimized implementation of FastPlaid. It provides the same functionality for multi-vector search using the PLAID algorithm, but runs entirely on CPU using ndarray instead of GPU tensors.

Key Features

  • Pure Rust: No Python or GPU dependencies required
  • CPU Optimized: Uses ndarray with rayon for parallel processing
  • BLAS Acceleration: Optional Accelerate (macOS) or OpenBLAS backends for faster matrix operations
  • Memory Efficient: Significantly lower memory usage compared to GPU-based solutions
  • K-means Integration: Uses fastkmeans-rs for centroid computation
  • Metadata Filtering: Optional SQLite-based metadata storage for filtered search

Installation

Add to your Cargo.toml:

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid" }

For NPY file support (required for index persistence):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy"] }

BLAS Acceleration (Recommended)

For optimal performance, enable BLAS acceleration:

macOS (Apple Accelerate framework):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "accelerate"] }

Linux (OpenBLAS):

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "openblas"] }

Note: OpenBLAS requires the system library to be installed (apt install libopenblas-dev on Ubuntu).

Metadata Filtering (Optional)

For SQLite-based metadata filtering:

[dependencies]
next-plaid = { git = "https://github.com/lightonai/next-plaid", features = ["npy", "filtering"] }

Quick Start

Creating an Index

use next_plaid::{Index, IndexConfig};
use ndarray::Array2;

// Your document embeddings (list of [num_tokens, dim] arrays)
let embeddings: Vec<Array2<f32>> = load_embeddings();

// Create index with automatic centroid computation
let config = IndexConfig::default();  // nbits=4, kmeans_niters=4, etc.
let index = Index::create_with_kmeans(&embeddings, "path/to/index", &config)?;

Searching

use next_plaid::{Index, SearchParameters};

// Load the index
let index = Index::load("path/to/index")?;

// Search parameters
let params = SearchParameters {
    batch_size: 128,
    n_full_scores: 1024,
    top_k: 10,
    n_ivf_probe: 32,
};

// Single query
let query: Array2<f32> = get_query_embeddings();
let result = index.search(&query, &params, None)?;

println!("Top results: {:?}", result.passage_ids);
println!("Scores: {:?}", result.scores);

// Batch search
let queries: Vec<Array2<f32>> = get_multiple_queries();
let results = index.search_batch(&queries, &params, true, None)?;

Update or Create Index

For convenience, use update_or_create to automatically create a new index if it doesn't exist, or update an existing one:

use next_plaid::{Index, IndexConfig, UpdateConfig};

let embeddings: Vec<Array2<f32>> = load_embeddings();
let index_config = IndexConfig::default();
let update_config = UpdateConfig::default();

// Creates index if it doesn't exist, otherwise updates it
let index = Index::update_or_create(
    &embeddings,
    "path/to/index",
    &index_config,
    &update_config,
)?;

Filtered Search with Metadata

The filtering feature provides SQLite-based metadata storage for efficient filtered search:

use next_plaid::{Index, IndexConfig, SearchParameters, filtering};
use serde_json::json;

// Create index
let embeddings: Vec<Array2<f32>> = load_embeddings();
let config = IndexConfig::default();
let index = Index::create_with_kmeans(&embeddings, "path/to/index", &config)?;

// Create metadata database with document attributes
let metadata = vec![
    json!({"title": "Document 1", "category": "science", "year": 2023}),
    json!({"title": "Document 2", "category": "history", "year": 2022}),
    json!({"title": "Document 3", "category": "science", "year": 2024}),
];
filtering::create("path/to/index", &metadata)?;

// Query metadata to get document subset
let subset = filtering::where_condition(
    "path/to/index",
    "category = ? AND year >= ?",
    &[json!("science"), json!(2023)],
)?;
// Returns: [0, 2] (documents matching the filter)

// Search only within the filtered subset
let query: Array2<f32> = get_query_embeddings();
let params = SearchParameters::default();
let result = index.search(&query, &params, Some(&subset))?;

Configuration

IndexConfig

Parameter Default Description
nbits 4 Quantization bits (2 or 4). Lower = faster but less accurate
batch_size 50000 Tokens per batch during indexing
seed None Random seed for reproducibility

SearchParameters

Parameter Default Description
batch_size 128 Batch size for processing queries
n_full_scores 4096 Candidates for full scoring
top_k 10 Number of results to return
n_ivf_probe 8 Cluster probes per query

UpdateConfig

Parameter Default Description
buffer_size 100 Documents to accumulate before centroid expansion
start_from_scratch 999 Rebuild threshold for small indices
kmeans_niters 4 K-means iterations for centroid expansion
max_points_per_centroid 256 Maximum points per centroid

Algorithm

The PLAID (Passage-Level Aligned Interaction with Documents) algorithm works in two phases:

Index Creation

  1. Compute K-means centroids on all token embeddings
  2. For each document, assign tokens to nearest centroids (codes)
  3. Compute and quantize residuals (difference from centroids)
  4. Build an inverted file (IVF) mapping centroids to documents

Search

  1. Compute query-centroid similarity scores
  2. Probe top-k IVF cells to get candidate documents
  3. Compute approximate scores using centroid codes
  4. Re-rank top candidates with decompressed exact embeddings
  5. Return top-k documents by ColBERT MaxSim score

Feature Flags

Feature Description
npy Index persistence with ndarray-npy
filtering SQLite-based metadata support
accelerate macOS BLAS acceleration
openblas Linux OpenBLAS acceleration

License

Apache-2.0