Expand description
Core quantization algorithms Core quantization algorithms and tensor operations
This module provides the fundamental quantization and dequantization algorithms for tensor operations, including per-tensor and per-channel quantization schemes.
§Features
- Per-tensor quantization: Single scale/zero-point for entire tensor
- Per-channel quantization: Individual scale/zero-point per channel
- Dequantization: Reverse quantization to restore floating point values
- Multiple schemes: Affine and symmetric quantization support
- Configuration-driven: Integration with QuantConfig for flexible usage
Structs§
- Cache
Aware Params - Cache-aware quantization parameters for optimal memory access patterns
Functions§
- calculate_
affine_ quantization_ params - Calculate affine quantization parameters (scale and zero_point)
- calculate_
symmetric_ quantization_ params - Calculate symmetric quantization parameters (scale only, zero_point = 0)
- calculate_
tensor_ stats_ cache_ optimized - Cache-optimized tensor statistics calculation with blocking
- dequantize
- Dequantize a quantized tensor using scale and zero_point
- dequantize_
per_ tensor_ affine - Dequantize a tensor using per-tensor affine dequantization
- get_
cache_ optimization_ recommendations - Get cache-aware optimization recommendations for tensor operations
- get_
dtype_ range - Get the quantization range for a given data type
- quantize_
auto - Convenience function to quantize with automatic parameter calculation
- quantize_
matrix_ cache_ friendly - Cache-friendly matrix quantization using tiling for 2D tensors
- quantize_
per_ channel_ auto - Auto-quantize a tensor using per-channel scheme
- quantize_
per_ tensor - Quantize a tensor to INT8 using specified scale and zero point
- quantize_
per_ tensor_ affine - Quantize a tensor using per-tensor affine quantization
- quantize_
per_ tensor_ affine_ cache_ aware - Cache-aware per-tensor quantization optimized for memory hierarchy
- quantize_
per_ tensor_ affine_ i8 - Quantize a tensor using per-tensor affine quantization (returns I8 tensor)
- quantize_
streaming_ with_ prefetch - Prefetch-aware sequential quantization for streaming data
- quantize_
tensor_ auto - Auto-quantize a tensor using per-tensor scheme
- quantize_
with_ cache_ optimization - Auto-select optimal quantization algorithm based on cache analysis
- quantize_
with_ config - Quantize a tensor using specified configuration