Expand description
trueno-quant has moved to aprender-quant.
This crate re-exports aprender-quant for backward compatibility.
New code should depend on aprender-quant directly.
Constants§
- F16_
MIN_ NORMAL - Minimum valid f16 normal value (~6.1e-5) Prevents NaN on round-trip through f16 encoding
- Q4_
K_ BLOCK_ BYTES Q4_Ksuper-block byte size- Q4_
K_ BLOCK_ SIZE Q4_Ksuper-block size (elements per block)- Q5_
K_ BLOCK_ BYTES Q5_Ksuper-block byte size- Q5_
K_ BLOCK_ SIZE Q5_Ksuper-block size (elements per block)- Q6_
K_ BLOCK_ BYTES Q6_Ksuper-block byte size- Q6_
K_ BLOCK_ SIZE Q6_Ksuper-block size (elements per block)
Functions§
- dequantize_
q4_ k_ to_ f32 - Dequantize
Q4_Kbytes to F32 - dequantize_
q5_ k_ to_ f32 - Dequantize
Q5_Kbytes to F32 - dequantize_
q6_ k_ to_ f32 - Dequantize
Q6_Kbytes to F32 - f16_
to_ f32 - Convert f16 to f32 (using half crate)
- f32_
to_ f16 - Convert f32 to f16 (using half crate)
- quantize_
q4_ k - Quantize F32 data to
Q4_Kformat (llama.cpp/candle compatible) - quantize_
q4_ k_ matrix - Quantize F32 matrix to
Q4_Kformat with proper row layout - quantize_
q5_ k - Quantize F32 data to
Q5_Kformat - quantize_
q5_ k_ matrix - Quantize F32 matrix to
Q5_Kformat with proper row layout - quantize_
q6_ k - Quantize F32 data to
Q6_Kformat (candle/GGUF compatible) - quantize_
q6_ k_ matrix - Quantize F32 matrix to
Q6_Kformat with proper row layout - transpose_
q4k_ for_ matmul - Transpose Q4K tensor from GGUF column-major to APR row-major layout
- transpose_
q5k_ for_ matmul - Transpose Q5K tensor from GGUF column-major to APR row-major layout
- transpose_
q6k_ for_ matmul - Transpose Q6K tensor from GGUF column-major to APR row-major layout