Skip to main content

Module quantization

Module quantization 

Source
Expand description

Quantization and Compression Functions

This module provides quantization operations for model compression including:

  • Uniform and non-uniform quantization
  • Dynamic quantization schemes
  • Pruning utilities (magnitude-based, structured, unstructured)
  • Model compression techniques
  • Low-precision computation functions
  • Knowledge distillation utilities

Enums§

QuantizationScheme
Quantization schemes
QuantizationType
Quantization data types

Functions§

dynamic_quantize
Dynamic quantization with automatic scale/zero-point calculation
fake_quantize
Quantize-aware training (QAT) simulation
gradual_magnitude_prune
Gradual magnitude pruning with sparsity scheduling
lottery_ticket_prune
Lottery ticket hypothesis: find winning subnetworks
magnitude_prune
Magnitude-based pruning
quantization_error_analysis
Quantization error analysis
uniform_dequantize
Dequantize uniformly quantized tensor
uniform_quantize
Uniform quantization
weight_clustering
Weight clustering for compression