Skip to main content

Module turboquant

Module turboquant

Expand description

TurboQuant KV cache compression — CPU reference implementation.

Implements the TurboQuant_mse algorithm:

Walsh-Hadamard rotation for incoherence
Per-head norm extraction
Lloyd-Max scalar quantization against N(0,1) codebooks

This module is CPU-only math — no Metal GPU dispatch.

Structs§

TurboQuantConfig: Configuration for TurboQuant quantization.

Enums§

BitWidth: Quantization bit-width for TurboQuant.

Constants§

CODEBOOK_2BIT: 2-bit Lloyd-Max centroids for N(0,1): 4 reconstruction levels.
CODEBOOK_3BIT: 3-bit Lloyd-Max centroids for N(0,1): 8 reconstruction levels.
CODEBOOK_4BIT: 4-bit Lloyd-Max centroids for N(0,1): 16 reconstruction levels.

Functions§

compute_lloyd_max_beta_codebook: Compute Lloyd-Max codebook for Beta((d-1)/2, (d-1)/2) scaled to [-1, 1].
compute_lloyd_max_codebook: Compute Lloyd-Max codebook for N(0,1) with the given number of levels.
fwht_inplace: In-place normalized Fast Walsh-Hadamard Transform.
turboquant_dequantize: Dequantize a TurboQuant-compressed head vector.
turboquant_quantize: Quantize a single head vector using TurboQuant_mse.