Expand description
TurboQuant KV cache compression — CPU reference implementation.
Implements the TurboQuant_mse algorithm:
- Walsh-Hadamard rotation for incoherence
- Per-head norm extraction
- Lloyd-Max scalar quantization against N(0,1) codebooks
This module is CPU-only math — no Metal GPU dispatch.
Structs§
- Turbo
Quant Config - Configuration for TurboQuant quantization.
Enums§
- BitWidth
- Quantization bit-width for TurboQuant.
Constants§
- CODEBOOK_
2BIT - 2-bit Lloyd-Max centroids for N(0,1): 4 reconstruction levels.
- CODEBOOK_
3BIT - 3-bit Lloyd-Max centroids for N(0,1): 8 reconstruction levels.
- CODEBOOK_
4BIT - 4-bit Lloyd-Max centroids for N(0,1): 16 reconstruction levels.
Functions§
- compute_
lloyd_ max_ beta_ codebook - Compute Lloyd-Max codebook for Beta((d-1)/2, (d-1)/2) scaled to [-1, 1].
- compute_
lloyd_ max_ codebook - Compute Lloyd-Max codebook for N(0,1) with the given number of levels.
- fwht_
inplace - In-place normalized Fast Walsh-Hadamard Transform.
- turboquant_
dequantize - Dequantize a TurboQuant-compressed head vector.
- turboquant_
quantize - Quantize a single head vector using TurboQuant_mse.