Module quantization

Module quantization 

Source
Expand description

Model Quantization Support

This module provides comprehensive quantization support for ML models, enabling efficient deployment on edge devices and reducing model size and inference latency.

§Supported Quantization Schemes

  • INT8 Quantization: 8-bit integer quantization with configurable ranges
  • INT4 Quantization: 4-bit integer quantization for extreme compression
  • Per-Tensor Quantization: Single scale/zero-point for entire tensor
  • Per-Channel Quantization: Independent scale/zero-point per channel
  • Symmetric Quantization: Zero-point = 0 (centered around zero)
  • Asymmetric Quantization: Arbitrary zero-point for full range coverage
  • Dynamic Quantization: Runtime quantization of activations

§Examples

use ipfrs_tensorlogic::{QuantizedTensor, QuantizationScheme, QuantizationConfig};

// Per-tensor INT8 symmetric quantization
let weights = vec![0.5, -0.3, 0.8, -0.1];
let config = QuantizationConfig::int8_symmetric();
let quantized = QuantizedTensor::quantize_per_tensor(&weights, vec![4], config).unwrap();

// Dequantize back to f32
let dequantized = quantized.dequantize();
assert_eq!(dequantized.len(), 4);

// Per-channel quantization for Conv2D weights
let weights = vec![0.5, 0.3, -0.2, -0.4, 0.1, 0.6, -0.5, 0.2]; // 2 channels, 4 elements each
let config = QuantizationConfig::int8_per_channel(2);
let quantized = QuantizedTensor::quantize_per_channel(&weights, vec![2, 4], config).unwrap();

Structs§

DynamicQuantizer
Dynamic quantization configuration for runtime quantization
QuantizationConfig
Quantization configuration
QuantizationParams
Quantization parameters for a single channel/tensor
QuantizedTensor
Quantized tensor representation

Enums§

CalibrationMethod
Calibration method for determining quantization parameters
QuantizationError
Errors that can occur during quantization operations
QuantizationGranularity
Quantization granularity
QuantizationScheme
Quantization scheme