Skip to main content

Module turbo_quant

Module turbo_quant 

Source
Expand description

TurboQuant: high-throughput compressed vector search.

PolarQuant at 4 bits: rotated embedding pairs are encoded as (radius, angle_index). The scan uses a pre-computed centroid-query dot product table (24 KB, fits in L1) and streams sequentially through packed radii + indices — cache-line optimal.

§Memory layout (SoA, not AoS)

CompressedCorpus:
  radii:   [n × pairs] f32, contiguous — sequential streaming reads
  indices: [n × pairs] u8,  contiguous — sequential streaming reads
  (future: 4-bit packed indices → [n × pairs / 2] u8 for 2× index bandwidth)

This layout enables:

  • GPU: one thread per vector, coalesced reads across threads
  • CPU NEON: process 4 pairs per SIMD iteration, amortize centroid loads
  • Cache: centroid table (24 KB) stays in L1 throughout the scan

Structs§

CompressedCode
Compressed representation of a single vector (for the old API).
CompressedCorpus
Flat, contiguous compressed embeddings for maximum scan throughput.
PolarCodec
PolarQuant codec: batch encode, query preparation, and high-throughput scan.
QueryState
Pre-computed query state for fast scanning.