Skip to main content

Module algorithms

Module algorithms 

Source
Expand description

Core quantization algorithms Core quantization algorithms and tensor operations

This module provides the fundamental quantization and dequantization algorithms for tensor operations, including per-tensor and per-channel quantization schemes.

§Features

  • Per-tensor quantization: Single scale/zero-point for entire tensor
  • Per-channel quantization: Individual scale/zero-point per channel
  • Dequantization: Reverse quantization to restore floating point values
  • Multiple schemes: Affine and symmetric quantization support
  • Configuration-driven: Integration with QuantConfig for flexible usage

Structs§

CacheAwareParams
Cache-aware quantization parameters for optimal memory access patterns

Functions§

calculate_affine_quantization_params
Calculate affine quantization parameters (scale and zero_point)
calculate_symmetric_quantization_params
Calculate symmetric quantization parameters (scale only, zero_point = 0)
calculate_tensor_stats_cache_optimized
Cache-optimized tensor statistics calculation with blocking
dequantize
Dequantize a quantized tensor using scale and zero_point
dequantize_per_tensor_affine
Dequantize a tensor using per-tensor affine dequantization
get_cache_optimization_recommendations
Get cache-aware optimization recommendations for tensor operations
get_dtype_range
Get the quantization range for a given data type
quantize_auto
Convenience function to quantize with automatic parameter calculation
quantize_matrix_cache_friendly
Cache-friendly matrix quantization using tiling for 2D tensors
quantize_per_channel_auto
Auto-quantize a tensor using per-channel scheme
quantize_per_tensor
Quantize a tensor to INT8 using specified scale and zero point
quantize_per_tensor_affine
Quantize a tensor using per-tensor affine quantization
quantize_per_tensor_affine_cache_aware
Cache-aware per-tensor quantization optimized for memory hierarchy
quantize_per_tensor_affine_i8
Quantize a tensor using per-tensor affine quantization (returns I8 tensor)
quantize_streaming_with_prefetch
Prefetch-aware sequential quantization for streaming data
quantize_tensor_auto
Auto-quantize a tensor using per-tensor scheme
quantize_with_cache_optimization
Auto-select optimal quantization algorithm based on cache analysis
quantize_with_config
Quantize a tensor using specified configuration