Crate trueno_quant

Expand description

trueno-quant has moved to aprender-quant.

This crate re-exports aprender-quant for backward compatibility. New code should depend on aprender-quant directly.

Constants§

F16_MIN_NORMAL: Minimum valid f16 normal value (~6.1e-5) Prevents NaN on round-trip through f16 encoding
Q4_K_BLOCK_BYTES: Q4_K super-block byte size
Q4_K_BLOCK_SIZE: Q4_K super-block size (elements per block)
Q5_K_BLOCK_BYTES: Q5_K super-block byte size
Q5_K_BLOCK_SIZE: Q5_K super-block size (elements per block)
Q6_K_BLOCK_BYTES: Q6_K super-block byte size
Q6_K_BLOCK_SIZE: Q6_K super-block size (elements per block)

dequantize_q4_k_to_f32: Dequantize Q4_K bytes to F32
dequantize_q5_k_to_f32: Dequantize Q5_K bytes to F32
dequantize_q6_k_to_f32: Dequantize Q6_K bytes to F32
f16_to_f32: Convert f16 to f32 (using half crate)
f32_to_f16: Convert f32 to f16 (using half crate)
quantize_q4_k: Quantize F32 data to Q4_K format (llama.cpp/candle compatible)
quantize_q4_k_matrix: Quantize F32 matrix to Q4_K format with proper row layout
quantize_q5_k: Quantize F32 data to Q5_K format
quantize_q5_k_matrix: Quantize F32 matrix to Q5_K format with proper row layout
quantize_q6_k: Quantize F32 data to Q6_K format (candle/GGUF compatible)
quantize_q6_k_matrix: Quantize F32 matrix to Q6_K format with proper row layout
transpose_q4k_for_matmul: Transpose Q4K tensor from GGUF column-major to APR row-major layout
transpose_q5k_for_matmul: Transpose Q5K tensor from GGUF column-major to APR row-major layout
transpose_q6k_for_matmul: Transpose Q6K tensor from GGUF column-major to APR row-major layout