Module turbo_quant

Expand description

TurboQuant: high-throughput compressed vector search.

PolarQuant at 4 bits: rotated embedding pairs are encoded as (radius, angle_index). The scan uses a pre-computed centroid-query dot product table (24 KB, fits in L1) and streams sequentially through packed radii + indices — cache-line optimal.

§Memory layout (SoA, not AoS)

CompressedCorpus:
  radii:   [n × pairs] f32, contiguous — sequential streaming reads
  indices: [n × pairs] u8,  contiguous — sequential streaming reads
  (future: 4-bit packed indices → [n × pairs / 2] u8 for 2× index bandwidth)

This layout enables:

GPU: one thread per vector, coalesced reads across threads
CPU NEON: process 4 pairs per SIMD iteration, amortize centroid loads
Cache: centroid table (24 KB) stays in L1 throughout the scan

Structs§

CompressedCode: Compressed representation of a single vector (for the old API).
CompressedCorpus: Flat, contiguous compressed embeddings for maximum scan throughput.
PolarCodec: PolarQuant codec: batch encode, query preparation, and high-throughput scan.
QueryState: Pre-computed query state for fast scanning.

Module turbo_quant

Module turbo_quant Copy item path

§Memory layout (SoA, not AoS)

Structs§

Module turbo_quant