Expand description
Quantization and Compression Functions
This module provides quantization operations for model compression including:
- Uniform and non-uniform quantization
- Dynamic quantization schemes
- Pruning utilities (magnitude-based, structured, unstructured)
- Model compression techniques
- Low-precision computation functions
- Knowledge distillation utilities
Enums§
- Quantization
Scheme - Quantization schemes
- Quantization
Type - Quantization data types
Functions§
- dynamic_
quantize - Dynamic quantization with automatic scale/zero-point calculation
- fake_
quantize - Quantize-aware training (QAT) simulation
- gradual_
magnitude_ prune - Gradual magnitude pruning with sparsity scheduling
- lottery_
ticket_ prune - Lottery ticket hypothesis: find winning subnetworks
- magnitude_
prune - Magnitude-based pruning
- quantization_
error_ analysis - Quantization error analysis
- uniform_
dequantize - Dequantize uniformly quantized tensor
- uniform_
quantize - Uniform quantization
- weight_
clustering - Weight clustering for compression