Expand description
§ToRSh Quantization Library
A comprehensive quantization library for deep learning tensor operations, providing state-of-the-art quantization algorithms, configuration management, performance metrics, and utility functions.
§Key Features
- Multiple Quantization Schemes: INT8, INT4, binary, ternary, group-wise quantization
- Advanced Observers: MinMax, Histogram, Percentile, MovingAverage calibration
- Backend Support: Native, FBGEMM, QNNPACK for optimized execution
- Comprehensive Metrics: PSNR, SNR, compression ratio analysis
- Configuration Tools: Builder patterns, validation, JSON serialization
- Utility Functions: Batch processing, error diagnostics, auto-calibration
§Architecture
The library is organized into specialized modules:
- config: Configuration types and builder patterns
- algorithms: Core quantization and dequantization algorithms
- observers: Calibration system for parameter estimation
- specialized: Advanced algorithms (INT4, binary, ternary, group-wise)
- metrics: Performance analysis and benchmarking tools
- utils: Utility functions for validation, batch processing, and reporting
§Quick Start
use torsh_quantization::{QuantConfig, quantize_with_config};
use torsh_tensor::creation::tensor_1d;
// Create a simple quantization configuration
let config = QuantConfig::int8();
// Create a tensor to quantize
let data = vec![0.0, 1.0, 2.0, 3.0];
let tensor = tensor_1d(&data).unwrap();
// Quantize the tensor
let (quantized, scale, zero_point) = quantize_with_config(&tensor, &config).unwrap();§Advanced Usage
§Custom Configuration
use torsh_quantization::{QuantConfig, ObserverType, QuantBackend};
let config = QuantConfig::int8()
.with_observer(ObserverType::Histogram)
.with_backend(QuantBackend::Fbgemm);§Batch Processing
use torsh_quantization::{quantize_batch_consistent, QuantConfig};
use torsh_tensor::creation::tensor_1d;
let tensor1 = tensor_1d(&[0.0, 1.0, 2.0]).unwrap();
let tensor2 = tensor_1d(&[1.0, 2.0, 3.0]).unwrap();
let tensor3 = tensor_1d(&[2.0, 3.0, 4.0]).unwrap();
let tensors = vec![&tensor1, &tensor2, &tensor3];
let config = QuantConfig::int8();
let results = quantize_batch_consistent(&tensors, &config).unwrap();§Performance Analysis
use torsh_quantization::{compare_quantization_configs, QuantConfig};
use torsh_tensor::creation::tensor_1d;
let tensor = tensor_1d(&[0.0, 1.0, 2.0, 3.0]).unwrap();
let configs = vec![
QuantConfig::int8(),
QuantConfig::int4(),
QuantConfig::per_channel(0),
];
let comparison = compare_quantization_configs(&tensor, &configs).unwrap();§Export Support
The library supports exporting quantized models to various formats:
- ONNX: Industry-standard format for cross-platform deployment
- TensorRT: NVIDIA’s high-performance inference engine
- TensorFlow Lite: Mobile and edge deployment
- Core ML: Apple’s machine learning framework
- Custom formats: Extensible architecture for new backends
Re-exports§
pub use simd_ops::calculate_tensor_stats_simd;pub use simd_ops::dequantize_per_tensor_affine_simd;pub use simd_ops::find_min_max_simd;pub use simd_ops::get_mobile_optimization_hints;pub use simd_ops::get_simd_width;pub use simd_ops::is_simd_available;pub use simd_ops::quantize_batch_consistent_simd;pub use simd_ops::quantize_mobile_optimized;pub use simd_ops::quantize_per_channel_simd;pub use simd_ops::quantize_per_tensor_affine_simd;pub use simd_ops::quantize_to_int8_simd;pub use simd_ops::MobileOptimizationHints;pub use simd_ops::TensorStats as SimdTensorStats;pub use benchmarks::BaselineMetrics;pub use benchmarks::BenchmarkConfig as SuiteBenchmarkConfig;pub use benchmarks::BenchmarkResult as SuiteBenchmarkResult;pub use benchmarks::HardwareInfo;pub use benchmarks::QuantizationBenchmarkSuite;pub use config::*;pub use algorithms::*;pub use observers::*;pub use specialized::*;pub use metrics::*;pub use analysis::*;pub use memory_pool::*;pub use quantum::*;pub use quantum_enhanced::*;pub use utils::*;pub use auto_config::*;
Modules§
- algorithms
- Core quantization algorithms Core quantization algorithms and tensor operations
- analysis
- Advanced analysis tools Analysis tools for quantization
- auto_
config - ML-powered auto-configuration system ML-Powered Auto-Configuration System
- benchmarks
- Comprehensive benchmark suite
- config
- Core configuration types and builders Quantization configuration types and builders
- memory_
pool - Memory pool management
- metrics
- Performance metrics and analysis Quantization quality metrics and analysis tools
- observers
- Observer system for calibration Observer implementations for quantization parameter calibration
- prelude
- Prelude module for convenient imports
- quantum
- Quantum-inspired quantization
- quantum_
enhanced - Enhanced quantum-inspired quantization
- simd_
ops - SIMD-accelerated operations SIMD-accelerated quantization operations
- specialized
- Specialized quantization schemes (INT4, binary, ternary, group-wise) Specialized quantization algorithms for advanced use cases
- utils
- Utility functions and helpers Quantization utilities and helper functions
Structs§
- Tensor
- The main Tensor type for ToRSh
Enums§
- DType
- Supported data types for tensors
- Torsh
Error - Main ToRSh error enum - unified interface to all error types
Constants§
Type Aliases§
- Torsh
Result - Result type alias for ToRSh operations