Skip to main content

Crate ruvector_rabitq

Crate ruvector_rabitq 

Source
Expand description

RaBitQ: Rotation-Based 1-bit Quantization for Approximate Nearest-Neighbor Search

Motivated by the SIGMOD 2024 algorithm by Jianyang Gao & Cheng Long: “RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search”. This crate ships two estimators:

  1. Symmetric (Charikar-style) — both query and database are 1-bit. est_cos = cos(π · (1 − B/D)) where B = padding-safe XNOR-popcount. Cheapest per candidate (O(D/64) popcount + 1 cos) — use when memory dominates throughput.

  2. Asymmetric (RaBitQ-2024-style) — query is f32, database is 1-bit. est_ip = ‖q‖ · ‖x‖ · (1/√D) · Σᵢ sign(x_rot,i) · q_rot,i. Unbiased on Haar-uniform rotations; tighter variance than (1). O(D) per candidate.

§Indexes

VariantStorageEstimatorRerank
FlatF32Indexf32 originalsexact L2N/A
RabitqIndexrotation + codessymmetricno
RabitqPlusIndexrotation + codes + originalssymmetric + exact rerankyes
RabitqAsymIndexrotation + codes [+ originals]asymmetric + optional rerankopt

All satisfy index::AnnIndex. Search is top-k via a bounded max-heap (O(n log k)), and scoring uses f32::total_cmp so NaN never panics.

§Guarantees

  • Padding-safe popcount at any D (handles D % 64 != 0 correctly).
  • Deterministic: (dim, seed, data) triple → bit-identical rotation + index build + search output across runs.
  • No unsafe, no external BLAS/LAPACK, no C/C++ deps.

§Benchmarks

See benches/rabitq_bench.rs for distance-kernel micro-benchmarks and src/main.rs (rabitq-demo) for end-to-end recall + throughput across n ∈ {1 k, 5 k, 50 k, 100 k}.

Re-exports§

pub use error::RabitqError;
pub use index::AnnIndex;
pub use index::FlatF32Index;
pub use index::RabitqAsymIndex;
pub use index::RabitqIndex;
pub use index::RabitqPlusIndex;
pub use index::SearchResult;
pub use kernel::CpuKernel;
pub use kernel::KernelCaps;
pub use kernel::ScanRequest;
pub use kernel::ScanResponse;
pub use kernel::VectorKernel;
pub use quantize::pack_bits;
pub use quantize::unpack_bits;
pub use quantize::BinaryCode;
pub use rotation::RandomRotation;
pub use rotation::RandomRotationKind;

Modules§

error
index
RaBitQ flat indexes — four search backends behind one trait:
kernel
VectorKernel trait — the pluggable execution backend for RaBitQ scan + rerank. Defined here (ADR-157 §“Where each piece lives”) because kernels are RaBitQ primitives; the cache is a consumer.
persist
Persistence for RabitqPlusIndex — seed-based reconstruction format.
quantize
Bit-packing, XNOR-popcount, and the two RaBitQ distance estimators:
rotation
Random orthogonal rotation.
scan
SIMD-accelerated symmetric-scan agreement-count kernel.