Skip to main content

Crate torsh_quantization

Crate torsh_quantization 

Source
Expand description

§ToRSh Quantization Library

A comprehensive quantization library for deep learning tensor operations, providing state-of-the-art quantization algorithms, configuration management, performance metrics, and utility functions.

§Key Features

  • Multiple Quantization Schemes: INT8, INT4, binary, ternary, group-wise quantization
  • Advanced Observers: MinMax, Histogram, Percentile, MovingAverage calibration
  • Backend Support: Native, FBGEMM, QNNPACK for optimized execution
  • Comprehensive Metrics: PSNR, SNR, compression ratio analysis
  • Configuration Tools: Builder patterns, validation, JSON serialization
  • Utility Functions: Batch processing, error diagnostics, auto-calibration

§Architecture

The library is organized into specialized modules:

  • config: Configuration types and builder patterns
  • algorithms: Core quantization and dequantization algorithms
  • observers: Calibration system for parameter estimation
  • specialized: Advanced algorithms (INT4, binary, ternary, group-wise)
  • metrics: Performance analysis and benchmarking tools
  • utils: Utility functions for validation, batch processing, and reporting

§Quick Start

use torsh_quantization::{QuantConfig, quantize_with_config};
use torsh_tensor::creation::tensor_1d;

// Create a simple quantization configuration
let config = QuantConfig::int8();

// Create a tensor to quantize
let data = vec![0.0, 1.0, 2.0, 3.0];
let tensor = tensor_1d(&data).unwrap();

// Quantize the tensor
let (quantized, scale, zero_point) = quantize_with_config(&tensor, &config).unwrap();

§Advanced Usage

§Custom Configuration

use torsh_quantization::{QuantConfig, ObserverType, QuantBackend};

let config = QuantConfig::int8()
    .with_observer(ObserverType::Histogram)
    .with_backend(QuantBackend::Fbgemm);

§Batch Processing

use torsh_quantization::{quantize_batch_consistent, QuantConfig};
use torsh_tensor::creation::tensor_1d;

let tensor1 = tensor_1d(&[0.0, 1.0, 2.0]).unwrap();
let tensor2 = tensor_1d(&[1.0, 2.0, 3.0]).unwrap();
let tensor3 = tensor_1d(&[2.0, 3.0, 4.0]).unwrap();
let tensors = vec![&tensor1, &tensor2, &tensor3];
let config = QuantConfig::int8();
let results = quantize_batch_consistent(&tensors, &config).unwrap();

§Performance Analysis

use torsh_quantization::{compare_quantization_configs, QuantConfig};
use torsh_tensor::creation::tensor_1d;

let tensor = tensor_1d(&[0.0, 1.0, 2.0, 3.0]).unwrap();
let configs = vec![
    QuantConfig::int8(),
    QuantConfig::int4(),
    QuantConfig::per_channel(0),
];
let comparison = compare_quantization_configs(&tensor, &configs).unwrap();

§Export Support

The library supports exporting quantized models to various formats:

  • ONNX: Industry-standard format for cross-platform deployment
  • TensorRT: NVIDIA’s high-performance inference engine
  • TensorFlow Lite: Mobile and edge deployment
  • Core ML: Apple’s machine learning framework
  • Custom formats: Extensible architecture for new backends

Re-exports§

pub use simd_ops::calculate_tensor_stats_simd;
pub use simd_ops::dequantize_per_tensor_affine_simd;
pub use simd_ops::find_min_max_simd;
pub use simd_ops::get_mobile_optimization_hints;
pub use simd_ops::get_simd_width;
pub use simd_ops::is_simd_available;
pub use simd_ops::quantize_batch_consistent_simd;
pub use simd_ops::quantize_mobile_optimized;
pub use simd_ops::quantize_per_channel_simd;
pub use simd_ops::quantize_per_tensor_affine_simd;
pub use simd_ops::quantize_to_int8_simd;
pub use simd_ops::MobileOptimizationHints;
pub use simd_ops::TensorStats as SimdTensorStats;
pub use benchmarks::BaselineMetrics;
pub use benchmarks::BenchmarkConfig as SuiteBenchmarkConfig;
pub use benchmarks::BenchmarkResult as SuiteBenchmarkResult;
pub use benchmarks::HardwareInfo;
pub use benchmarks::QuantizationBenchmarkSuite;
pub use config::*;
pub use algorithms::*;
pub use observers::*;
pub use specialized::*;
pub use metrics::*;
pub use analysis::*;
pub use memory_pool::*;
pub use quantum::*;
pub use quantum_enhanced::*;
pub use utils::*;
pub use auto_config::*;

Modules§

algorithms
Core quantization algorithms Core quantization algorithms and tensor operations
analysis
Advanced analysis tools Analysis tools for quantization
auto_config
ML-powered auto-configuration system ML-Powered Auto-Configuration System
benchmarks
Comprehensive benchmark suite
config
Core configuration types and builders Quantization configuration types and builders
memory_pool
Memory pool management
metrics
Performance metrics and analysis Quantization quality metrics and analysis tools
observers
Observer system for calibration Observer implementations for quantization parameter calibration
prelude
Prelude module for convenient imports
quantum
Quantum-inspired quantization
quantum_enhanced
Enhanced quantum-inspired quantization
simd_ops
SIMD-accelerated operations SIMD-accelerated quantization operations
specialized
Specialized quantization schemes (INT4, binary, ternary, group-wise) Specialized quantization algorithms for advanced use cases
utils
Utility functions and helpers Quantization utilities and helper functions

Structs§

Tensor
The main Tensor type for ToRSh

Enums§

DType
Supported data types for tensors
TorshError
Main ToRSh error enum - unified interface to all error types

Constants§

VERSION
VERSION_MAJOR
VERSION_MINOR
VERSION_PATCH

Type Aliases§

TorshResult
Result type alias for ToRSh operations