Expand description
Axonml Quant - Model Quantization Library
Provides quantization support for reducing model size and improving inference performance. Supports multiple quantization formats:
- Q8_0: 8-bit quantization (block size 32)
- Q4_0: 4-bit quantization (block size 32)
- Q4_1: 4-bit quantization with min/max (block size 32)
- F16: Half-precision floating point
§Example
ⓘ
use axonml_quant::{quantize_tensor, QuantType};
let tensor = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0], &[4])?;
let quantized = quantize_tensor(&tensor, QuantType::Q8_0)?;
let dequantized = dequantize_tensor(&quantized)?;@version 0.1.0 @author AutomataNexus Development Team
Re-exports§
pub use error::QuantError;pub use error::QuantResult;pub use types::QuantType;pub use types::QuantizedTensor;pub use types::QuantizedBlock;pub use quantize::quantize_tensor;pub use quantize::quantize_model;pub use dequantize::dequantize_tensor;pub use dequantize::dequantize_block;pub use calibration::CalibrationData;pub use calibration::calibrate;
Modules§
- calibration
- Calibration for Quantization
- dequantize
- Dequantization Functions
- error
- Quantization Error Types
- quantize
- Quantization Functions
- types
- Quantization Types
Constants§
- DEFAULT_
BLOCK_ SIZE - Default block size for quantization.
- MAX_
BLOCK_ SIZE - Maximum block size supported.