Expand description
Model Quantization Support
This module provides comprehensive quantization support for ML models, enabling efficient deployment on edge devices and reducing model size and inference latency.
§Supported Quantization Schemes
- INT8 Quantization: 8-bit integer quantization with configurable ranges
- INT4 Quantization: 4-bit integer quantization for extreme compression
- Per-Tensor Quantization: Single scale/zero-point for entire tensor
- Per-Channel Quantization: Independent scale/zero-point per channel
- Symmetric Quantization: Zero-point = 0 (centered around zero)
- Asymmetric Quantization: Arbitrary zero-point for full range coverage
- Dynamic Quantization: Runtime quantization of activations
§Examples
use ipfrs_tensorlogic::{QuantizedTensor, QuantizationScheme, QuantizationConfig};
// Per-tensor INT8 symmetric quantization
let weights = vec![0.5, -0.3, 0.8, -0.1];
let config = QuantizationConfig::int8_symmetric();
let quantized = QuantizedTensor::quantize_per_tensor(&weights, vec![4], config).unwrap();
// Dequantize back to f32
let dequantized = quantized.dequantize();
assert_eq!(dequantized.len(), 4);
// Per-channel quantization for Conv2D weights
let weights = vec![0.5, 0.3, -0.2, -0.4, 0.1, 0.6, -0.5, 0.2]; // 2 channels, 4 elements each
let config = QuantizationConfig::int8_per_channel(2);
let quantized = QuantizedTensor::quantize_per_channel(&weights, vec![2, 4], config).unwrap();Structs§
- Dynamic
Quantizer - Dynamic quantization configuration for runtime quantization
- Quantization
Config - Quantization configuration
- Quantization
Params - Quantization parameters for a single channel/tensor
- Quantized
Tensor - Quantized tensor representation
Enums§
- Calibration
Method - Calibration method for determining quantization parameters
- Quantization
Error - Errors that can occur during quantization operations
- Quantization
Granularity - Quantization granularity
- Quantization
Scheme - Quantization scheme