pub fn quantize_q4_k(data: &[f32]) -> Vec<u8> ⓘExpand description
Quantize F32 data to Q4_K format (llama.cpp/candle compatible)
Q4_K format: 256 elements per super-block, 144 bytes per block
Layout: d (2B) + dmin (2B) + scales (12B) + qs (128B)
Value packing (candle/llama.cpp layout):
- For each 64-value chunk: 32 bytes store low nibbles first, then high nibbles
- Low nibbles use scale[is], high nibbles use scale[is+1]