PolarQuant
Walsh-Hadamard rotation + polar coordinate quantization for LLM weight and KV cache compression.
Overview
PolarQuant compresses neural network weights and KV cache embeddings by:
- Block-wise L2 normalization to the unit hypersphere
- Walsh-Hadamard rotation to decorrelate values into approximately i.i.d. Gaussian
- Recursive polar coordinate transformation
- Lloyd-Max optimal codebook quantization of resulting angles
Work in progress. Contributions welcome.