pub fn quantize_4bit_double(values: &[f32]) -> DoubleQuantized4BitExpand description
Quantize values to 4-bit with double quantization of scale factors
First applies standard 4-bit quantization, then quantizes the resulting FP32 scale factors to 8-bit with a second-level block size of 256.