Expand description
Model optimization techniques (quantization, pruning, knowledge distillation)
Structs§
- Quantization
Params - Quantization parameters
Enums§
- Quantization
Mode - Quantization configuration
Functions§
- calculate_
speedup - Calculate speedup from optimization
- compression_
ratio - Calculate model compression ratio
- dequantize_
from_ int8 - Dequantize INT8 weights back to FP32
- prune_
weights - Apply magnitude-based pruning to weights
- quantize_
to_ int8 - Quantize a weight matrix to INT8