Expand description
Model compression and quantization for efficient embedding deployment
This module provides advanced compression techniques including quantization, pruning, knowledge distillation, and neural architecture search.
Structs§
- Architecture
- Architecture
Candidate - Compressed
Model - Compression
Stats - Compression
Target - Results and data structures
- Distillation
Config - Knowledge distillation configuration
- Distillation
Processor - Knowledge distillation processor
- Distillation
Result - Hardware
Constraints - Hardware constraints for NAS
- Layer
Config - Model
Compression Manager - Model compression manager
- NASConfig
- Neural Architecture Search configuration
- NASProcessor
- Neural Architecture Search processor
- Optimal
Architecture - Pruning
Config - Pruning configuration
- Pruning
Processor - Pruning processor
- Pruning
Result - Quantization
Config - Quantization configuration
- Quantization
Params - Quantization parameters
- Quantization
Processor - Quantization processor
- Quantization
Result
Enums§
- Activation
Type - Distillation
Type - Types of knowledge distillation
- Hardware
Platform - Target hardware platforms
- Layer
Type - Optimization
Target - Optimization targets
- Pruning
Method - Pruning methods
- Pruning
Schedule - Pruning schedules
- Quantization
Method - Quantization methods
- Search
Space - Architecture search spaces
- Search
Strategy - Neural architecture search strategies