RaBitQ Rust Library
This crate provides a pure-Rust implementation of the RaBitQ quantisation scheme and an IVF + RaBitQ searcher that mirrors the
behaviour of the reference C++ pipeline. The library focuses on efficient approximate nearest-neighbour search for high-
dimensional vectors and now ships with tooling to reproduce the GIST benchmark pipeline described in example.sh. The crates.io
package is distributed as rabitq-rs because the original rabitq name was claimed before the 2025 publishing push for this
Rust port.
Highlights
- Full IVF + RaBitQ searcher – the
IvfRabitqIndexsupports both L2 and inner-product metrics, fastscan-style pruning, and optional extended codes. - Pre-clustered training support –
IvfRabitqIndex::train_with_clusterslets you reuse centroids and cluster assignments generated by external tooling (e.g. thepython/ivf.pyhelper that wraps FAISS), matching the workflow used by the C++ binaries in this repository. - Dataset utilities – the new
rabitq_rs::iomodule parses.fvecsand.ivecsfiles, including convenience helpers for cluster-id lists and ground-truth tables. - Command-line evaluation –
cargo run --bin gistbuilds an IVF + RaBitQ index from the GIST dataset and reports recall and throughput for a configurablenprobe/top-kbudget.
Quick start
Add the crate to your project by pointing Cargo.toml at this repository, adding rabitq-rs from crates.io, or by linking to a
local checkout. The snippet below constructs an IVF index from randomly generated vectors, queries it, and prints the nearest
neighbour id.
use ;
use Metric;
use *;
Training with pre-computed clusters
When you already have k-means centroids and assignments (for example produced by FAISS), call train_with_clusters:
use IvfRabitqIndex;
use Metric;
let index = train_with_clusters?;
The new unit tests (preclustered_training_matches_naive_l2 and _ip) verify that the pre-clustered fastscan path matches the
naïve reconstruction baseline for both distance metrics.
Reproducing the GIST IVF + RaBitQ benchmark
Follow the same data preparation steps shown in example.sh:
- Download and unpack the dataset
If FTP is blocked in your environment, fetch the files from an alternative mirror and place them underdata/gist/with the same filenames (gist_base.fvecs,gist_query.fvecs,gist_groundtruth.ivecs).
After the dataset is in place you can choose between two training workflows:
Option 1: Use pre-computed clusters (FAISS-compatible)
-
Cluster the base vectors – the helper script mirrors the FAISS call used by the C++ sample:
(Swap
l2foripif you plan to evaluate inner-product similarity.) -
Build and evaluate the Rust index – the CLI supports limiting the number of base vectors and queries so you can perform a smoke test without loading the full 1M-vector dataset:
The command prints the construction time, the evaluated recall@
top-k, and the observed queries-per-second. Remove the--max-base/--max-querieslimits to run the full benchmark once you are comfortable with the workflow.
Option 2: Train everything in Rust (no pre-computed centroids)
Skip the Python/FAISS clustering step and let the crate execute k-means internally. Provide the desired IVF list count via
--nlist:
The command mirrors the pre-computed flow but performs clustering in-process using the Rust IvfRabitqIndex::train helper. Expect
the build phase to take longer than the pre-clustered path because the binary runs k-means internally. Once the index is trained
the evaluation output matches the format of the pre-computed mode.
All CLI options are documented in cargo run --bin gist -- --help.
Testing and linting
The test suite now includes regression checks for the dataset readers and the pre-clustered IVF flow. Run the full suite along with the standard linters before submitting changes:
For dataset-backed evaluation, invoke the gist binary as described above (optionally with reduced limits for quicker runs).
Publishing to crates.io
The crate is configured for publication on crates.io. Before publishing a new release:
- Update the version – bump the
versionfield inCargo.tomlfollowing semantic versioning. - Log in to crates.io – authenticate once per workstation:
- Validate the package – ensure the crate builds cleanly and packages without missing files:
Inspect the generated.cratearchive undertarget/package/if you need to double-check the bundle contents. - Publish – when you are ready, push the package live:
If you need to yank a release, run cargo yank --vers <version> (optionally with --undo). Remember that published versions
are immutable, so double-check the README and API docs before releasing.
Project structure
src/
bin/gist.rs # CLI for building & evaluating IVF + RaBitQ on GIST
io.rs # .fvecs/.ivecs readers and helpers
ivf.rs # IVF + RaBitQ searcher and training routines
kmeans.rs # Lightweight k-means used for in-crate training
math.rs # Vector math helpers
quantizer.rs # Core RaBitQ quantisation logic
rotation.rs # Random orthonormal rotator
Refer to README.origin.md for the original upstream documentation.