Skip to main content

quantize_q6_k

Function quantize_q6_k 

Source
pub fn quantize_q6_k(data: &[f32]) -> Vec<u8> 
Expand description

Quantize F32 data to Q6_K format (candle/GGUF compatible)

Q6_K format: 256-element super-blocks Each super block: ql (128 bytes) + qh (64 bytes) + scales (16 bytes) + d (f16) = 210 bytes

  • 6-bit values stored split: low 4 bits in ql, high 2 bits in qh
  • 16 sub-blocks of 16 elements each, with int8 scale per sub-block