Expand description
int8 scalar quantization + SIMD-friendly scoring for embedding vectors.
Adapted from the TurboQuant approach (RyanCodrai/turbovec, ICLR 2026): a data-oblivious, training-free, single-pass quantizer. At lean-ctx’s scale (hundreds of facts × 384-dim MiniLM vectors) the win is twofold:
- 4× smaller on-disk knowledge index (
i8codes vsf32). - Faster scoring — the query is rotated once into the codebook domain
(the per-vector
scale) and accumulated directly overi8codes, so we never reconstruct the fullf32document vector (turbovec’s core idea).
No heavy SIMD crate is pulled in: the chunked-lane accumulators below are shaped so the autovectorizer emits NEON/AVX automatically, with a scalar tail that is always correct on every target.
Structs§
- Quantized
Vector - A vector stored as int8 codes plus the per-vector scale needed to reconstruct
approximate values:
value[i] ≈ code[i] · scale.