next-plaid 0.8.7

CPU-based PLAID implementation for multi-vector search using ndarray
Documentation

CPU-based PLAID implementation for multi-vector search using ndarray.

Overview

next-plaid is a pure Rust library implementing the PLAID (Performance-optimized Late Interaction using Approximate nearest neighbor Indexing for Dense retrieval) algorithm. It enables efficient ColBERT-style late interaction retrieval with:

  • Memory-mapped index files for low RAM usage
  • Product quantization with configurable bit-width (2-bit or 4-bit)
  • IVF (Inverted File) for coarse-grained candidate filtering
  • ColBERT MaxSim scoring for late interaction ranking
  • SQLite-based metadata filtering
  • Incremental index updates and document deletion

Installation

Cargo.toml

Add to Cargo.toml:

[dependencies]
next-plaid = "0.4"

Feature Flags

Feature Description Dependencies
default Pure Rust, no BLAS None
accelerate Apple Accelerate BLAS (macOS only) accelerate-src
openblas OpenBLAS (Linux/cross-platform) openblas-src (system OpenBLAS required)

With Apple Accelerate (macOS):

[dependencies]
next-plaid = { version = "0.4", features = ["accelerate"] }

With OpenBLAS (Linux):

[dependencies]
next-plaid = { version = "0.4", features = ["openblas"] }

Requires system OpenBLAS:

# Ubuntu/Debian
sudo apt-get install libopenblas-dev

# Fedora
sudo dnf install openblas-devel

# Arch Linux
sudo pacman -S openblas

Public API

Re-exports from lib.rs

pub use codec::ResidualCodec;
pub use delete::delete_from_index;
pub use error::{Error, Result};
pub use index::MmapIndex;
pub use index::{IndexConfig, Metadata};
pub use kmeans::{
    compute_centroids, compute_centroids_from_documents, compute_kmeans,
    estimate_num_partitions, ComputeKmeansConfig, FastKMeans, KMeansConfig,
};
pub use search::{QueryResult, SearchParameters};
pub use update::UpdateConfig;

Public Modules

  • codec - Residual quantization codec
  • delete - Document deletion
  • embeddings - Embedding reconstruction
  • error - Error types
  • filtering - SQLite metadata filtering
  • index - Index creation and MmapIndex
  • kmeans - K-means clustering
  • mmap - Memory-mapped array types
  • search - Search functionality
  • update - Incremental updates
  • utils - Utility functions

Core Types

MmapIndex

Memory-mapped PLAID index. Primary interface for search operations.

pub struct MmapIndex {
    pub path: String,
    pub metadata: Metadata,
    pub codec: ResidualCodec,
    pub ivf: Array1<i64>,
    pub ivf_lengths: Array1<i32>,
    pub ivf_offsets: Array1<i64>,
    pub doc_lengths: Array1<i64>,
    pub doc_offsets: Array1<usize>,
    pub mmap_codes: MmapNpyArray1I64,
    pub mmap_residuals: MmapNpyArray2U8,
}

Methods

// Load existing index
fn load(index_path: &str) -> Result<Self>

// Create or update index (primary way to create/update indices)
fn update_or_create(
    embeddings: &[Array2<f32>],
    index_path: &str,
    index_config: &IndexConfig,
    update_config: &UpdateConfig,
) -> Result<(Self, Vec<i64>)>

// Search single query
fn search(
    &self,
    query: &Array2<f32>,
    params: &SearchParameters,
    subset: Option<&[i64]>,
) -> Result<SearchResult>

// Search multiple queries
fn search_batch(
    &self,
    queries: &[Array2<f32>],
    params: &SearchParameters,
    parallel: bool,
    subset: Option<&[i64]>,
) -> Result<Vec<SearchResult>>

// Add documents
fn update(
    &mut self,
    embeddings: &[Array2<f32>],
    config: &UpdateConfig,
) -> Result<Vec<i64>>

// Add documents with metadata
fn update_with_metadata(
    &mut self,
    embeddings: &[Array2<f32>],
    config: &UpdateConfig,
    metadata: Option<&[serde_json::Value]>,
) -> Result<Vec<i64>>

// Delete documents
fn delete(&mut self, doc_ids: &[i64]) -> Result<usize>

// Reconstruct embeddings
fn reconstruct(&self, doc_ids: &[i64]) -> Result<Vec<Array2<f32>>>
fn reconstruct_single(&self, doc_id: i64) -> Result<Array2<f32>>

// Accessors
fn num_documents(&self) -> usize
fn num_embeddings(&self) -> usize
fn num_partitions(&self) -> usize
fn avg_doclen(&self) -> f64
fn embedding_dim(&self) -> usize

IndexConfig

Configuration for index creation.

pub struct IndexConfig {
    pub nbits: usize,                    // Quantization bits (2 or 4), default: 4
    pub batch_size: usize,               // Documents per chunk, default: 50_000
    pub seed: Option<u64>,               // Random seed, default: Some(42)
    pub kmeans_niters: usize,            // K-means iterations, default: 4
    pub max_points_per_centroid: usize,  // K-means parameter, default: 256
    pub n_samples_kmeans: Option<usize>, // K-means samples, default: auto
    pub start_from_scratch: usize,       // Rebuild threshold, default: 999
}

Default:

IndexConfig {
    nbits: 4,
    batch_size: 50_000,
    seed: Some(42),
    kmeans_niters: 4,
    max_points_per_centroid: 256,
    n_samples_kmeans: None,
    start_from_scratch: 999,
}

SearchParameters

Search configuration.

pub struct SearchParameters {
    pub batch_size: usize,                        // Queries per batch, default: 2000
    pub n_full_scores: usize,                     // Candidates to re-rank, default: 4096
    pub top_k: usize,                             // Results to return, default: 10
    pub n_ivf_probe: usize,                       // IVF cells to probe, default: 8
    pub centroid_batch_size: usize,               // Centroid scoring batch, default: 100_000
    pub centroid_score_threshold: Option<f32>,    // Pruning threshold, default: Some(0.4)
}

Default:

SearchParameters {
    batch_size: 2000,
    n_full_scores: 4096,
    top_k: 10,
    n_ivf_probe: 8,
    centroid_batch_size: 100_000,
    centroid_score_threshold: Some(0.4),
}

UpdateConfig

Configuration for index updates.

pub struct UpdateConfig {
    pub batch_size: usize,               // Documents per chunk, default: 50_000
    pub kmeans_niters: usize,            // K-means iterations, default: 4
    pub max_points_per_centroid: usize,  // K-means parameter, default: 256
    pub n_samples_kmeans: Option<usize>, // K-means samples, default: auto
    pub seed: u64,                       // Random seed, default: 42
    pub start_from_scratch: usize,       // Rebuild threshold, default: 999
    pub buffer_size: usize,              // Buffer before expansion, default: 100
}

QueryResult / SearchResult

Search result container.

pub struct QueryResult {
    pub query_id: usize,
    pub passage_ids: Vec<i64>,
    pub scores: Vec<f32>,
}

pub type SearchResult = QueryResult;

Metadata

Index metadata (persisted in metadata.json).

pub struct Metadata {
    pub num_chunks: usize,
    pub nbits: usize,
    pub num_partitions: usize,
    pub num_embeddings: usize,
    pub avg_doclen: f64,
    pub num_documents: usize,
    pub next_plaid_compatible: bool,
}

ResidualCodec

Quantization codec for compression/decompression.

pub struct ResidualCodec {
    pub nbits: usize,
    pub centroids: CentroidStore,
    pub avg_residual: Array1<f32>,
    pub bucket_cutoffs: Option<Array1<f32>>,
    pub bucket_weights: Option<Array1<f32>>,
    // ... internal lookup tables
}

Methods

fn new(
    nbits: usize,
    centroids: Array2<f32>,
    avg_residual: Array1<f32>,
    bucket_cutoffs: Option<Array1<f32>>,
    bucket_weights: Option<Array1<f32>>,
) -> Result<Self>

fn load_from_dir(index_path: &Path) -> Result<Self>
fn load_mmap_from_dir(index_path: &Path) -> Result<Self>

fn compress_into_codes(&self, embeddings: &Array2<f32>) -> Array1<usize>
fn quantize_residuals(&self, residuals: &Array2<f32>) -> Result<Array2<u8>>
fn decompress(&self, packed: &Array2<u8>, codes: &ArrayView1<usize>) -> Result<Array2<f32>>

fn embedding_dim(&self) -> usize
fn num_centroids(&self) -> usize
fn centroids_view(&self) -> ArrayView2<'_, f32>

Error

Error types.

pub enum Error {
    IndexCreation(String),
    Search(String),
    Io(std::io::Error),
    Json(serde_json::Error),
    Shape(String),
    IndexLoad(String),
    Codec(String),
    Config(String),
    Update(String),
    Delete(String),
    Filtering(String),
    Sqlite(rusqlite::Error),
    NpyRead(ndarray_npy::ReadNpyError),
    NpyWrite(ndarray_npy::WriteNpyError),
}

pub type Result<T> = std::result::Result<T, Error>;

Filtering Module

SQLite-based document metadata filtering.

Functions

// Check if metadata database exists
pub fn exists(index_path: &str) -> bool

// Create new metadata database
pub fn create(
    index_path: &str,
    metadata: &[serde_json::Value],
    doc_ids: &[i64],
) -> Result<usize>

// Append metadata rows
pub fn update(
    index_path: &str,
    metadata: &[serde_json::Value],
    doc_ids: &[i64],
) -> Result<usize>

// Delete and re-index
pub fn delete(index_path: &str, subset: &[i64]) -> Result<usize>

// Query document IDs by SQL condition
pub fn where_condition(
    index_path: &str,
    condition: &str,          // SQL WHERE clause with ? placeholders
    parameters: &[Value],
) -> Result<Vec<i64>>

// Query with REGEXP support
pub fn where_condition_regexp(
    index_path: &str,
    condition: &str,
    parameters: &[Value],
) -> Result<Vec<i64>>

// Get full metadata rows
pub fn get(
    index_path: &str,
    condition: Option<&str>,
    parameters: &[Value],
    subset: Option<&[i64]>,
) -> Result<Vec<Value>>

// Count documents
pub fn count(index_path: &str) -> Result<usize>

K-means Module

Centroid computation functions.

pub struct ComputeKmeansConfig {
    pub kmeans_niters: usize,            // default: 4
    pub max_points_per_centroid: usize,  // default: 256
    pub seed: u64,                       // default: 42
    pub n_samples_kmeans: Option<usize>, // default: auto
    pub num_partitions: Option<usize>,   // default: auto
}

// Compute centroids from flat embeddings
pub fn compute_centroids(
    embeddings: &ArrayView2<f32>,
    num_centroids: usize,
    config: Option<KMeansConfig>,
) -> Result<Array2<f32>>

// Compute centroids from document embeddings
pub fn compute_centroids_from_documents(
    documents: &[Array2<f32>],
    num_centroids: usize,
    config: Option<KMeansConfig>,
) -> Result<Array2<f32>>

// Full K-means pipeline (samples, clusters, normalizes)
pub fn compute_kmeans(
    documents_embeddings: &[Array2<f32>],
    config: &ComputeKmeansConfig,
) -> Result<Array2<f32>>

// Estimate number of partitions
pub fn estimate_num_partitions(documents: &[Array2<f32>]) -> usize

Standalone Functions

Index Creation

// Create index files with pre-computed centroids
pub fn create_index_files(
    embeddings: &[Array2<f32>],
    centroids: Array2<f32>,
    index_path: &str,
    config: &IndexConfig,
) -> Result<Metadata>

Deletion

// Delete documents from index
pub fn delete_from_index(doc_ids: &[i64], index_path: &str) -> Result<usize>

Index File Structure

index_directory/
  metadata.json           # Index metadata
  centroids.npy           # Centroid embeddings [K, dim]
  bucket_cutoffs.npy      # Quantization boundaries
  bucket_weights.npy      # Reconstruction values
  avg_residual.npy        # Average residual per dimension
  cluster_threshold.npy   # Outlier detection threshold
  ivf.npy                 # Inverted file (doc IDs per centroid)
  ivf_lengths.npy         # Length of each IVF posting list
  plan.json               # Indexing plan
  merged_codes.npy        # Memory-mapped codes (auto-generated)
  merged_residuals.npy    # Memory-mapped residuals (auto-generated)
  metadata.db             # SQLite metadata (optional)

  # Per-chunk files:
  0.codes.npy             # Centroid assignments for chunk 0
  0.residuals.npy         # Quantized residuals for chunk 0
  0.metadata.json         # Chunk metadata
  doclens.0.json          # Document lengths for chunk 0

Usage Examples

Create Index

use next_plaid::{MmapIndex, IndexConfig, UpdateConfig};
use ndarray::Array2;

// Document embeddings: Vec of [num_tokens, dim] arrays
let embeddings: Vec<Array2<f32>> = load_embeddings();

let index_config = IndexConfig {
    nbits: 4,
    ..Default::default()
};
let update_config = UpdateConfig::default();

// Creates if doesn't exist, updates otherwise
let (index, doc_ids) = MmapIndex::update_or_create(
    &embeddings,
    "/path/to/index",
    &index_config,
    &update_config,
)?;

Load and Search

use next_plaid::{MmapIndex, SearchParameters};
use ndarray::Array2;

let index = MmapIndex::load("/path/to/index")?;

// Query embedding: [num_tokens, dim]
let query: Array2<f32> = encode_query("search text");

let params = SearchParameters {
    top_k: 10,
    n_ivf_probe: 16,
    ..Default::default()
};

let results = index.search(&query, &params, None)?;

for (doc_id, score) in results.passage_ids.iter().zip(results.scores.iter()) {
    println!("Doc {}: {:.4}", doc_id, score);
}

Search with Filtering

use next_plaid::{MmapIndex, SearchParameters, filtering};
use serde_json::json;

let index = MmapIndex::load("/path/to/index")?;

// Get document IDs matching filter
let subset = filtering::where_condition(
    "/path/to/index",
    "category = ? AND score > ?",
    &[json!("tech"), json!(0.5)],
)?;

// Search within subset
let results = index.search(&query, &params, Some(&subset))?;

Incremental Update

use next_plaid::{MmapIndex, UpdateConfig};

let mut index = MmapIndex::load("/path/to/index")?;

let new_embeddings: Vec<Array2<f32>> = load_new_documents();
let config = UpdateConfig::default();

// Returns assigned document IDs
let doc_ids = index.update(&new_embeddings, &config)?;

Update with Metadata

use next_plaid::{MmapIndex, UpdateConfig};
use serde_json::json;

let mut index = MmapIndex::load("/path/to/index")?;

let new_embeddings: Vec<Array2<f32>> = load_new_documents();
let metadata = vec![
    json!({"title": "Doc A", "category": "tech"}),
    json!({"title": "Doc B", "category": "science"}),
];

let config = UpdateConfig::default();
let doc_ids = index.update_with_metadata(&new_embeddings, &config, Some(&metadata))?;

Delete Documents

use next_plaid::MmapIndex;

let mut index = MmapIndex::load("/path/to/index")?;

let docs_to_delete = vec![5, 10, 15];
let deleted_count = index.delete(&docs_to_delete)?;

Reconstruct Embeddings

use next_plaid::MmapIndex;

let index = MmapIndex::load("/path/to/index")?;

// Reconstruct multiple documents
let embeddings = index.reconstruct(&[0, 1, 2])?;

// Reconstruct single document
let doc_emb = index.reconstruct_single(0)?;

Update or Create

use next_plaid::{MmapIndex, IndexConfig, UpdateConfig};

let embeddings: Vec<Array2<f32>> = load_embeddings();

let index_config = IndexConfig::default();
let update_config = UpdateConfig::default();

// Creates if doesn't exist, updates otherwise
let (index, doc_ids) = MmapIndex::update_or_create(
    &embeddings,
    "/path/to/index",
    &index_config,
    &update_config,
)?;

Update Behavior

The update system has three modes based on index size:

  1. Start-from-scratch (num_documents <= start_from_scratch, default 999):

    • Loads existing embeddings from embeddings.npy
    • Combines with new embeddings
    • Rebuilds entire index with fresh K-means
  2. Buffer mode (total_new < buffer_size, default 100):

    • Adds documents without centroid expansion
    • Saves to buffer for later expansion
  3. Centroid expansion (total_new >= buffer_size):

    • Deletes previously buffered documents
    • Finds outliers beyond cluster_threshold
    • Expands centroids via K-means on outliers
    • Re-indexes all buffered + new documents

Search Algorithm

  1. IVF Probing: Compute query-centroid scores, select top n_ivf_probe centroids per query token
  2. Candidate Retrieval: Get document IDs from selected IVF posting lists
  3. Approximate Scoring: Score candidates using centroid approximation (MaxSim with centroids)
  4. Re-ranking: Take top n_full_scores candidates
  5. Exact Scoring: Decompress embeddings, compute exact ColBERT MaxSim
  6. Return: Top top_k results with scores

Dependencies

ndarray = "0.16"          # N-dimensional arrays
rayon = "1.10"            # Parallelism
serde = "1.0"             # Serialization
serde_json = "1.0"        # JSON
thiserror = "2.0"         # Error handling
ndarray-npy = "0.9"       # NPY file format
fastkmeans-rs = "0.1"     # K-means clustering
memmap2 = "0.9"           # Memory mapping
half = "2.4"              # Float16 support
rusqlite = "0.38"         # SQLite
regex = "1.11"            # Regex for filtering

License

Apache-2.0