Skip to main content

Crate trueno_quant

Crate trueno_quant 

Source
Expand description

trueno-quant has moved to aprender-quant.

This crate re-exports aprender-quant for backward compatibility. New code should depend on aprender-quant directly.

Constants§

F16_MIN_NORMAL
Minimum valid f16 normal value (~6.1e-5) Prevents NaN on round-trip through f16 encoding
Q4_K_BLOCK_BYTES
Q4_K super-block byte size
Q4_K_BLOCK_SIZE
Q4_K super-block size (elements per block)
Q5_K_BLOCK_BYTES
Q5_K super-block byte size
Q5_K_BLOCK_SIZE
Q5_K super-block size (elements per block)
Q6_K_BLOCK_BYTES
Q6_K super-block byte size
Q6_K_BLOCK_SIZE
Q6_K super-block size (elements per block)

Functions§

dequantize_q4_k_to_f32
Dequantize Q4_K bytes to F32
dequantize_q5_k_to_f32
Dequantize Q5_K bytes to F32
dequantize_q6_k_to_f32
Dequantize Q6_K bytes to F32
f16_to_f32
Convert f16 to f32 (using half crate)
f32_to_f16
Convert f32 to f16 (using half crate)
quantize_q4_k
Quantize F32 data to Q4_K format (llama.cpp/candle compatible)
quantize_q4_k_matrix
Quantize F32 matrix to Q4_K format with proper row layout
quantize_q5_k
Quantize F32 data to Q5_K format
quantize_q5_k_matrix
Quantize F32 matrix to Q5_K format with proper row layout
quantize_q6_k
Quantize F32 data to Q6_K format (candle/GGUF compatible)
quantize_q6_k_matrix
Quantize F32 matrix to Q6_K format with proper row layout
transpose_q4k_for_matmul
Transpose Q4K tensor from GGUF column-major to APR row-major layout
transpose_q5k_for_matmul
Transpose Q5K tensor from GGUF column-major to APR row-major layout
transpose_q6k_for_matmul
Transpose Q6K tensor from GGUF column-major to APR row-major layout