Crate torsh_quantization

Crate torsh_quantization 

Source
Expand description

§ToRSh Quantization Library

A comprehensive quantization library for deep learning tensor operations, providing state-of-the-art quantization algorithms, configuration management, performance metrics, and utility functions.

§Key Features

  • Multiple Quantization Schemes: INT8, INT4, binary, ternary, group-wise quantization
  • Advanced Observers: MinMax, Histogram, Percentile, MovingAverage calibration
  • Backend Support: Native, FBGEMM, QNNPACK for optimized execution
  • Comprehensive Metrics: PSNR, SNR, compression ratio analysis
  • Configuration Tools: Builder patterns, validation, JSON serialization
  • Utility Functions: Batch processing, error diagnostics, auto-calibration

§Architecture

The library is organized into specialized modules:

  • config: Configuration types and builder patterns
  • algorithms: Core quantization and dequantization algorithms
  • observers: Calibration system for parameter estimation
  • specialized: Advanced algorithms (INT4, binary, ternary, group-wise)
  • metrics: Performance analysis and benchmarking tools
  • utils: Utility functions for validation, batch processing, and reporting

§Quick Start

use torsh_quantization::{QuantConfig, quantize_with_config};
use torsh_tensor::creation::tensor_1d;

// Create a simple quantization configuration
let config = QuantConfig::int8();

// Create a tensor to quantize
let data = vec![0.0, 1.0, 2.0, 3.0];
let tensor = tensor_1d(&data).unwrap();

// Quantize the tensor
let (quantized, scale, zero_point) = quantize_with_config(&tensor, &config).unwrap();

§Advanced Usage

§Custom Configuration

use torsh_quantization::{QuantConfig, ObserverType, QuantBackend};

let config = QuantConfig::int8()
    .with_observer(ObserverType::Histogram)
    .with_backend(QuantBackend::Fbgemm);

§Batch Processing

use torsh_quantization::{quantize_batch_consistent, QuantConfig};
use torsh_tensor::creation::tensor_1d;

let tensor1 = tensor_1d(&[0.0, 1.0, 2.0]).unwrap();
let tensor2 = tensor_1d(&[1.0, 2.0, 3.0]).unwrap();
let tensor3 = tensor_1d(&[2.0, 3.0, 4.0]).unwrap();
let tensors = vec![&tensor1, &tensor2, &tensor3];
let config = QuantConfig::int8();
let results = quantize_batch_consistent(&tensors, &config).unwrap();

§Performance Analysis

use torsh_quantization::{compare_quantization_configs, QuantConfig};
use torsh_tensor::creation::tensor_1d;

let tensor = tensor_1d(&[0.0, 1.0, 2.0, 3.0]).unwrap();
let configs = vec![
    QuantConfig::int8(),
    QuantConfig::int4(),
    QuantConfig::per_channel(0),
];
let comparison = compare_quantization_configs(&tensor, &configs).unwrap();

§Export Support

The library supports exporting quantized models to various formats:

  • ONNX: Industry-standard format for cross-platform deployment
  • TensorRT: NVIDIA’s high-performance inference engine
  • TensorFlow Lite: Mobile and edge deployment
  • Core ML: Apple’s machine learning framework
  • Custom formats: Extensible architecture for new backends

Re-exports§

pub use config::*;
pub use algorithms::*;
pub use observers::*;
pub use specialized::*;
pub use metrics::*;
pub use analysis::*;
pub use utils::*;
pub use memory_pool::*;
pub use simd_ops::*;
pub use quantum::*;
pub use quantum_enhanced::*;
pub use benchmarks::*;

Modules§

algorithms
Core quantization algorithms and tensor operations
analysis
Analysis tools for quantization
benchmarks
Comprehensive Benchmark Suite for Quantization
config
Quantization configuration types and builders
memory_pool
Memory Pool Management for Quantization
metrics
Quantization quality metrics and analysis tools
observers
Observer implementations for quantization parameter calibration
quantum
Quantum-Inspired Quantization Techniques
quantum_enhanced
Enhanced Quantum-Inspired Quantization with Advanced Algorithms
simd_ops
SIMD-accelerated quantization operations
specialized
Specialized quantization algorithms for advanced use cases
utils
Quantization utilities and helper functions

Structs§

Tensor
The main Tensor type for ToRSh

Enums§

DType
Supported data types for tensors
TorshError
Main ToRSh error enum - unified interface to all error types

Type Aliases§

TorshResult
Result type alias for ToRSh operations