Skip to main content

Module embedding_quant

Module embedding_quant 

Source
Expand description

int8 scalar quantization + SIMD-friendly scoring for embedding vectors.

Adapted from the TurboQuant approach (RyanCodrai/turbovec, ICLR 2026): a data-oblivious, training-free, single-pass quantizer. At lean-ctx’s scale (hundreds of facts × 384-dim MiniLM vectors) the win is twofold:

  1. 4× smaller on-disk knowledge index (i8 codes vs f32).
  2. Faster scoring — the query is rotated once into the codebook domain (the per-vector scale) and accumulated directly over i8 codes, so we never reconstruct the full f32 document vector (turbovec’s core idea).

No heavy SIMD crate is pulled in: the chunked-lane accumulators below are shaped so the autovectorizer emits NEON/AVX automatically, with a scalar tail that is always correct on every target.

Structs§

QuantizedVector
A vector stored as int8 codes plus the per-vector scale needed to reconstruct approximate values: value[i] ≈ code[i] · scale.

Functions§

dot_f32
SIMD-friendly f32 dot product with chunked lane accumulators.
dot_quant
Asymmetric dot product: full-precision query · quantized doc.
quantize
Symmetric, per-vector quantization: scale = max|x| / 127, code = round(x / scale).