Expand description
RaBitQ — 1-bit quantization with O(1/√D) MSE error bound (SIGMOD 2024).
§Algorithm outline
- Calibration: compute centroid
cover training vectors. - Rotation: apply a randomised Walsh-Hadamard transform (WHT) with a
seed-derived signed-diagonal matrix
Dto the residualv - c.- WHT requires a power-of-2 length; dimensions that are not pow2 are zero-padded before the transform then truncated after.
Dis a vector of±1scalars derived deterministically fromrotation_seedusing an xorshift64 generator.
- Encoding:
code = sign(R·(v-c)), packed 1-bit-per-dimension. - Distance estimation (L2):
‖q-v‖² ≈ ‖v-c‖² + ‖q-c‖² − 2‖v-c‖‖q-c‖·(1 − 2·hamming/D)withO(1/√D)MSE — see SIGMOD 2024 Theorem 4.
§IP / raw inner-product bias
The angular estimator above is MSE-optimal for L2 and cosine. For raw IP
it carries a systematic bias. When bias_correct = true the codec
subtracts dot_quantized (stored in the QuantHeader) from the
asymmetric distance to partially compensate, following the TurboQuant-style
QJL residual correction. This is off by default; consumers that use raw IP
(RAG, attention scores) should enable it.
[rand_xoshiro] is not a dependency of nodedb-codec. The signed-
diagonal flip vector is derived from rotation_seed via an inline
xorshift64 generator — no external crate required.
Structs§
- RaBitQ
Codec - RaBitQ codec: 1-bit quantization with O(1/√D) MSE error bound.
- RaBitQ
Quantized - Packed 1-bit quantized vector produced by
RaBitQCodec::encode. - RaBitQ
Query - Prepared query for
RaBitQCodecdistance computations.