Module algorithms

Expand description

Core quantization algorithms Core quantization algorithms and tensor operations

This module provides the fundamental quantization and dequantization algorithms for tensor operations, including per-tensor and per-channel quantization schemes.

§Features

Per-tensor quantization: Single scale/zero-point for entire tensor
Per-channel quantization: Individual scale/zero-point per channel
Dequantization: Reverse quantization to restore floating point values
Multiple schemes: Affine and symmetric quantization support
Configuration-driven: Integration with QuantConfig for flexible usage

Structs§

CacheAwareParams: Cache-aware quantization parameters for optimal memory access patterns

Functions§

calculate_affine_quantization_params: Calculate affine quantization parameters (scale and zero_point)
calculate_symmetric_quantization_params: Calculate symmetric quantization parameters (scale only, zero_point = 0)
calculate_tensor_stats_cache_optimized: Cache-optimized tensor statistics calculation with blocking
dequantize: Dequantize a quantized tensor using scale and zero_point
dequantize_per_tensor_affine: Dequantize a tensor using per-tensor affine dequantization
get_cache_optimization_recommendations: Get cache-aware optimization recommendations for tensor operations
get_dtype_range: Get the quantization range for a given data type
quantize_auto: Convenience function to quantize with automatic parameter calculation
quantize_matrix_cache_friendly: Cache-friendly matrix quantization using tiling for 2D tensors
quantize_per_channel_auto: Auto-quantize a tensor using per-channel scheme
quantize_per_tensor: Quantize a tensor to INT8 using specified scale and zero point
quantize_per_tensor_affine: Quantize a tensor using per-tensor affine quantization
quantize_per_tensor_affine_cache_aware: Cache-aware per-tensor quantization optimized for memory hierarchy
quantize_per_tensor_affine_i8: Quantize a tensor using per-tensor affine quantization (returns I8 tensor)
quantize_streaming_with_prefetch: Prefetch-aware sequential quantization for streaming data
quantize_tensor_auto: Auto-quantize a tensor using per-tensor scheme
quantize_with_cache_optimization: Auto-select optimal quantization algorithm based on cache analysis
quantize_with_config: Quantize a tensor using specified configuration

Module algorithms

Module algorithms Copy item path

§Features

Structs§

Functions§

Module algorithms