Expand description
TurboQuant: high-throughput compressed vector search.
PolarQuant at 4 bits: rotated embedding pairs are encoded as (radius, angle_index). The scan uses a pre-computed centroid-query dot product table (24 KB, fits in L1) and streams sequentially through packed radii + indices — cache-line optimal.
§Memory layout (SoA, not AoS)
CompressedCorpus:
radii: [n × pairs] f32, contiguous — sequential streaming reads
indices: [n × pairs] u8, contiguous — sequential streaming reads
(future: 4-bit packed indices → [n × pairs / 2] u8 for 2× index bandwidth)This layout enables:
- GPU: one thread per vector, coalesced reads across threads
- CPU NEON: process 4 pairs per SIMD iteration, amortize centroid loads
- Cache: centroid table (24 KB) stays in L1 throughout the scan
Structs§
- Compressed
Code - Compressed representation of a single vector (for the old API).
- Compressed
Corpus - Flat, contiguous compressed embeddings for maximum scan throughput.
- Polar
Codec - PolarQuant codec: batch encode, query preparation, and high-throughput scan.
- Query
State - Pre-computed query state for fast scanning.