Expand description
Model compression module Model compression utilities for neural networks
This module provides tools for model compression including:
- Quantization (post-training and quantization-aware training)
- Pruning (magnitude-based, structured, and unstructured)
- Knowledge distillation
- Model compression analysis and optimization
Structs§
- Accuracy
Metrics - Accuracy measurement metrics
- Calibration
Statistics - Statistics collected during calibration
- Compression
Analyzer - Model compression analyzer
- Compression
Report - Comprehensive compression analysis report
- Model
Pruner - Neural network pruner
- Post
Training Quantizer - Post-training quantization manager
- Quantization
Params - Quantization parameters for a tensor
- Sparsity
Statistics - Sparsity statistics for a layer
- Speed
Metrics - Speed measurement metrics
Enums§
- Calibration
Method - Quantization calibration method
- Pruning
Method - Pruning method
- Quantization
Bits - Quantization precision levels
- Quantization
Scheme - Quantization scheme
- Structured
Granularity - Structured pruning granularity