RaBitQ Rust Library
This crate provides a pure-Rust implementation of the RaBitQ quantization scheme and an IVF + RaBitQ searcher that mirrors the
behavior of the C++ RaBitQ Library. The library focuses on efficient approximate nearest-neighbor search for high-dimensional vectors and now ships with tooling to reproduce the GIST benchmark pipeline described in example.sh.
Highlights
- Full IVF + RaBitQ searcher – the
IvfRabitqIndexsupports both L2 and inner-product metrics, fastscan-style pruning, and optional extended codes. - Pre-clustered training support –
IvfRabitqIndex::train_with_clusterslets you reuse centroids and cluster assignments generated by external tooling (e.g. thepython/ivf.pyhelper that wraps FAISS), matching the workflow used by the upstream C++ library. - Dataset utilities – the new
rabitq_rs::iomodule parses.fvecsand.ivecsfiles, including convenience helpers for cluster-id lists and ground-truth tables. - Command-line evaluation –
cargo run --bin gistbuilds an IVF + RaBitQ index from the GIST dataset and reports recall and throughput for a configurablenprobe/top-kbudget.
Quick start
Add the crate to your project by pointing Cargo.toml at this repository, adding rabitq-rs from crates.io, or by linking to a
local checkout. The snippet below constructs an IVF index from randomly generated vectors, queries it, and prints the nearest
neighbour id.
use ;
use Metric;
use *;
Training with pre-computed clusters
When you already have k-means centroids and assignments (for example produced by FAISS), call train_with_clusters:
use IvfRabitqIndex;
use Metric;
let index = train_with_clusters?;
Reproducing the GIST IVF + RaBitQ benchmark
Follow the same data preparation steps shown in example.sh:
-
Download and unpack the dataset
If FTP is blocked in your environment, fetch the files from an alternative mirror and place them under
data/gist/with the same filenames (gist_base.fvecs,gist_query.fvecs,gist_groundtruth.ivecs).
After the dataset is in place you can choose between two training workflows:
Option 1: Use pre-computed clusters (FAISS-compatible)
-
Cluster the base vectors – the helper script mirrors the FAISS call used by the C++ sample:
(Swap
l2foripif you plan to evaluate inner-product similarity.) -
Build and evaluate the Rust index – the CLI supports limiting the number of base vectors and queries so you can perform a smoke test without loading the full 1M-vector dataset:
The command prints the construction time, the evaluated recall@
top-k, and the observed queries-per-second. Remove the--max-base/--max-querieslimits to run the full benchmark once you are comfortable with the workflow.
Option 2: Train everything in Rust (no pre-computed centroids)
Skip the Python/FAISS clustering step and let the crate execute k-means internally. Provide the desired IVF list count via
--nlist:
The command mirrors the pre-computed flow but performs clustering in-process using the Rust IvfRabitqIndex::train helper. Expect
the build phase to take longer than the pre-clustered path because the binary runs k-means internally. Once the index is trained
the evaluation output matches the format of the pre-computed mode.
All CLI options are documented in cargo run --bin gist -- --help.
Persisting trained indexes
Use the persistence hooks to avoid retraining between benchmarking runs:
The first command trains with the Rust pipeline, writes the persisted index, and records the benchmark. The second command reuses the saved index for subsequent recall sweeps or profiling runs.
Testing and linting
The test suite now includes regression checks for the dataset readers and the pre-clustered IVF flow. Run the full suite along with the standard linters before submitting changes:
For dataset-backed evaluation, invoke the gist binary as described above.
Publishing to crates.io
The crate is configured for publication on crates.io. Before publishing a new release:
-
Update the version – bump the
versionfield inCargo.tomlfollowing semantic versioning. -
Log in to crates.io – authenticate once per workstation:
-
Validate the package – ensure the crate builds cleanly and packages without missing files:
Inspect the generated
.cratearchive undertarget/package/if you need to double-check the bundle contents. -
Publish – when you are ready, push the package live:
If you need to yank a release, run cargo yank --vers <version> (optionally with --undo). Remember that published versions
are immutable, so double-check the README and API docs before releasing.
Project structure
src/
bin/gist.rs # CLI for building & evaluating IVF + RaBitQ on GIST
io.rs # .fvecs/.ivecs readers and helpers
ivf.rs # IVF + RaBitQ searcher and training routines
kmeans.rs # Lightweight k-means used for in-crate training
math.rs # Vector math helpers
quantizer.rs # Core RaBitQ quantisation logic
rotation.rs # Random orthonormal rotator
Refer to README.origin.md for the original upstream documentation.