Module optimization

Module optimization 

Source
Expand description

Model optimization techniques (quantization, pruning, knowledge distillation)

Structs§

QuantizationParams
Quantization parameters

Enums§

QuantizationMode
Quantization configuration

Functions§

calculate_speedup
Calculate speedup from optimization
compression_ratio
Calculate model compression ratio
dequantize_from_int8
Dequantize INT8 weights back to FP32
prune_weights
Apply magnitude-based pruning to weights
quantize_to_int8
Quantize a weight matrix to INT8