Skip to main content

quantize_q4_k

Function quantize_q4_k 

Source
pub fn quantize_q4_k(data: &[f32]) -> Vec<u8> 
Expand description

Quantize F32 data to Q4_K format (llama.cpp/candle compatible)

Q4_K format: 256 elements per super-block, 144 bytes per block Layout: d (2B) + dmin (2B) + scales (12B) + qs (128B)

Value packing (candle/llama.cpp layout):

  • For each 64-value chunk: 32 bytes store low nibbles first, then high nibbles
  • Low nibbles use scale[is], high nibbles use scale[is+1]