Skip to main content

Module quantize

Module quantize 

Source
Expand description

Bit-packing, XNOR-popcount, and the two RaBitQ distance estimators:

  1. Symmetric Charikar-style angular estimator — both query and database are 1-bit. Derived from hyperplane-LSH collision probability: E[B/D] = 1 − θ/π This is what the shipped crate had at commit f2dbb6efb.

  2. Asymmetric RaBitQ-2024 inner-product estimator — query stays in f32, database is 1-bit b_i ∈ {−1/√D, +1/√D}. Inner product is reconstructed by summing the rotated query’s components with signs, then rescaled by a precomputed factor derived from the stored unit-sphere inner-product bias. Unbiased for Haar-uniform rotations with O(1/√D) variance.

The asymmetric path closes the gap between this crate’s estimator and the SIGMOD 2024 paper (Gao & Long) — the symmetric path remains for apples-to-apples comparison against naive 1-bit codes.

§Bit packing

Each dimension is one bit: 1 if the rotated value ≥ 0, else 0. Bits are packed MSB-first into u64 words. When D % 64 != 0 the last word carries 64·n_words − D padding bits that are zero in every code; XNOR-popcount must mask those bits off before counting, otherwise padding bits always agree and the estimator is biased. masked_xnor_popcount handles this correctly; xnor_popcount is retained for the aligned case.

Structs§

BinaryCode
A packed binary code representing one vector (D bits).

Functions§

pack_bits
Pack bits from a boolean slice into u64 words (for testing/utilities).
unpack_bits
Unpack u64 words back into a bool slice of length dim.