Module quantization

Module quantization 

Source
Expand description

Quantization support for model compression and acceleration (Phase 11) モデル圧縮・高速化のための量子化サポート(フェーズ11) Quantization support for RusTorch - Phase 11 Implementation RusTorch用量子化サポート - フェーズ11実装

This module provides comprehensive quantization support for deep learning models, enabling efficient inference and training with reduced precision arithmetic.

このモジュールは深層学習モデルの包括的な量子化サポートを提供し、 精度を下げた算術演算による効率的な推論と学習を可能にします。

§Key Features

§Dynamic Quantization

  • Runtime quantization of weights and activations
  • Automatic calibration using statistical observers
  • Per-tensor and per-channel quantization schemes

§Static Quantization

  • Pre-calibrated quantization parameters
  • Optimal for deployment scenarios
  • Hardware-accelerated operations

§Quantization-Aware Training (QAT)

  • Training with quantization simulation
  • Straight-through estimators for gradients
  • Fine-tuning of quantized models

§Hardware Optimization

  • CPU SIMD optimizations for quantized operations
  • CUDA kernels for GPU acceleration
  • Metal Performance Shaders for Apple Silicon

§Quantization Schemes

§Symmetric Quantization

quantized = round(fp32_value / scale) + zero_point
dequantized = (quantized - zero_point) * scale

§Asymmetric Quantization

quantized = round(fp32_value / scale)
dequantized = quantized * scale  

§Usage Examples

use rustorch::quantization::{QuantizedTensor, QuantizationScheme, StaticQuantizer, TensorQuantization};
use rustorch::tensor::Tensor;
// Dynamic quantization
let tensor: Tensor<f32> = Tensor::randn(&[128, 256]);
let quantized = tensor.quantize_dynamic(QuantizationScheme::Symmetric)?;

// Static quantization with calibration
let mut quantizer = StaticQuantizer::<f32>::new();
quantizer.calibrate(QuantizationScheme::Symmetric)?;

Re-exports§

pub use calibration::HistogramObserver;
pub use calibration::MinMaxObserver;
pub use calibration::Observer;
pub use calibration::StaticQuantizer;
pub use hardware::optimized_ops;
pub use operations::DequantizeOps;
pub use operations::QuantizedOps;
pub use qat::FakeQuantize;
pub use qat::QATConv2d;
pub use qat::QATLinear;
pub use qat::QATModule;
pub use schemes::AsymmetricQuantization;
pub use schemes::QuantizationScheme;
pub use schemes::SymmetricQuantization;
pub use types::QuantizationType;
pub use types::QuantizedTensor;

Modules§

calibration
Calibration and statistical observation キャリブレーションと統計観測 Calibration and statistical observation for quantization 量子化のためのキャリブレーションと統計観測
hardware
Hardware-specific optimizations ハードウェア固有最適化 Hardware-specific optimizations for quantized operations 量子化演算のハードウェア固有最適化
observers
Statistical observers for calibration キャリブレーション用統計観測器 Statistical observers for quantization calibration 量子化キャリブレーション用統計観測器
operations
Quantized tensor operations 量子化テンソル演算 Quantized tensor operations and arithmetic 量子化テンソル演算と算術
qat
Quantization-aware training support 量子化認識学習サポート Quantization-Aware Training (QAT) support 量子化認識学習(QAT)サポート
schemes
Quantization schemes and algorithms 量子化スキームとアルゴリズム Quantization schemes and algorithms 量子化スキームとアルゴリズム
types
Quantized tensor data types and core structures 量子化テンソルデータ型とコア構造 Quantized tensor types and data structures 量子化テンソル型とデータ構造

Structs§

QuantParamCalculator
Unified quantization parameter calculator 統一量子化パラメータ計算器
QuantizationConfig
Global quantization configuration グローバル量子化設定

Traits§

Quantizable
Trait for quantizable data types 量子化可能なデータ型のトレイト
TensorQuantization
Main quantization API for tensors テンソル用メイン量子化API