Skip to main content

Module turboquant

Module turboquant 

Source
Expand description

TurboQuant KV cache compression — CPU reference implementation.

Implements the TurboQuant_mse algorithm:

  1. Walsh-Hadamard rotation for incoherence
  2. Per-head norm extraction
  3. Lloyd-Max scalar quantization against N(0,1) codebooks

This module is CPU-only math — no Metal GPU dispatch.

Structs§

TurboQuantConfig
Configuration for TurboQuant quantization.

Enums§

BitWidth
Quantization bit-width for TurboQuant.

Constants§

CODEBOOK_2BIT
2-bit Lloyd-Max centroids for N(0,1): 4 reconstruction levels.
CODEBOOK_3BIT
3-bit Lloyd-Max centroids for N(0,1): 8 reconstruction levels.
CODEBOOK_4BIT
4-bit Lloyd-Max centroids for N(0,1): 16 reconstruction levels.

Functions§

compute_lloyd_max_beta_codebook
Compute Lloyd-Max codebook for Beta((d-1)/2, (d-1)/2) scaled to [-1, 1].
compute_lloyd_max_codebook
Compute Lloyd-Max codebook for N(0,1) with the given number of levels.
fwht_inplace
In-place normalized Fast Walsh-Hadamard Transform.
turboquant_dequantize
Dequantize a TurboQuant-compressed head vector.
turboquant_quantize
Quantize a single head vector using TurboQuant_mse.