RaBitQ-rs
Pure Rust implementation of the RaBitQ vector quantization algorithm with IVF (Inverted File) search capabilities.
Up to 32× memory compression with significantly higher accuracy than traditional Product Quantization (PQ) or Scalar Quantization (SQ).
Why RaBitQ?
RaBitQ delivers exceptional compression-accuracy trade-offs for approximate nearest neighbor search:
- vs. Raw Vectors: Achieve up to 32× memory reduction with configurable compression ratios
- vs. PQ/SQ: Substantially higher accuracy, especially at high recall targets (90%+)
- How it Works: Combines 1-bit binary codes with multi-bit refinement codes for precise distance estimation
The quantization applies to residual vectors (data - centroid), enabling accurate reconstruction even at extreme compression rates.
Why This Rust Implementation?
This library provides a feature-complete RaBitQ + IVF search engine with all optimizations from the original C++ implementation:
- ✅ Faster Config Mode: 100-500× faster index building with <1% accuracy loss
- ✅ FHT Rotation: Fast Hadamard Transform for efficient orthonormal rotations
- ✅ SIMD Accelerated: Optimized distance computations on modern CPUs
- ✅ Memory Safe: No segfaults, no unsafe code in dependencies
- ✅ Complete IVF Pipeline: Training (built-in k-means or FAISS-compatible), search, persistence
Quick Start
Add to your Cargo.toml:
[]
= "0.3"
Build an index and search in ~10 lines:
use ;
use *;
Usage
Training with Built-in K-Means
use ;
let index = train?;
Training with Pre-computed Clusters (FAISS-Compatible)
If you already have centroids and cluster assignments from FAISS or another tool:
use ;
let index = train_with_clusters?;
Using Faster Config for Large Datasets
For datasets >100K vectors, enable faster_config to accelerate training by 100-500× with minimal accuracy loss (<1%):
let index = train?;
Searching
use SearchParams;
let params = new;
let results = index.search?;
for result in results.iter.take
Parameter Tuning:
nprobe: Higher = better recall, slower search (typical: 10-1024)top_k: Number of neighbors to returntotal_bits: More bits = higher accuracy, larger index (typical: 3-7)
Saving and Loading
// Save index to disk (with CRC32 checksum)
index.save?;
// Load index from disk
let loaded_index = load?;
CLI Tool: Benchmark on GIST-1M
The ivf_rabitq binary lets you evaluate RaBitQ on standard datasets like GIST-1M:
1. Download GIST-1M Dataset
2. Build Index
3. Run Benchmark
This performs an automatic nprobe sweep and reports:
nprobe | QPS | Recall@100
-------|---------|------------
5 | 12500 | 45.2%
10 | 8200 | 62.8%
20 | 5100 | 75.4%
...
Alternative: Pre-cluster with FAISS
For C++ library compatibility, generate clusters using the provided Python helper:
Testing
Run the test suite before committing changes:
All tests use seeded RNGs for reproducible results.
Citation
If you use RaBitQ in your research, please cite the original paper:
Original C++ Implementation: VectorDB-NTU/RaBitQ-Library
License
Licensed under the Apache License, Version 2.0. See LICENSE for details.