Expand description
Model optimization techniques for efficient inference
This module provides various model optimization techniques to reduce model size, improve inference speed, and reduce memory consumption while maintaining accuracy.
§Techniques
- Quantization: Reduce precision (FP32 -> INT8/FP16)
- Pruning: Remove unnecessary weights and connections
- Knowledge Distillation: Transfer knowledge from large to small models
- Model Compression: GZIP, Huffman coding, weight sharing
§Example
use oxigdal_ml::optimization::quantize_model;
use oxigdal_ml::optimization::{QuantizationConfig, QuantizationType};
let config = QuantizationConfig::builder()
.quantization_type(QuantizationType::Int8)
.per_channel(true)
.build();
quantize_model("model.onnx", "model_quantized.onnx", &config)?;Re-exports§
pub use distillation::DenseLayer;pub use distillation::DistillationConfig;pub use distillation::DistillationConfigBuilder;pub use distillation::DistillationLoss;pub use distillation::DistillationStats;pub use distillation::DistillationTrainer;pub use distillation::EarlyStopping;pub use distillation::ForwardCache;pub use distillation::LearningRateSchedule;pub use distillation::MLPGradients;pub use distillation::OptimizerType;pub use distillation::SimpleMLP;pub use distillation::SimpleRng;pub use distillation::Temperature;pub use distillation::TrainingState;pub use distillation::cross_entropy_loss;pub use distillation::cross_entropy_with_label;pub use distillation::kl_divergence;pub use distillation::kl_divergence_from_logits;pub use distillation::log_softmax;pub use distillation::mse_loss;pub use distillation::soft_targets;pub use distillation::softmax;pub use distillation::train_student_model;pub use pruning::FineTuneCallback;pub use pruning::GradientInfo;pub use pruning::ImportanceMethod;pub use pruning::LotteryTicketState;pub use pruning::MaskCreationMode;pub use pruning::NoOpFineTune;pub use pruning::PruningConfig;pub use pruning::PruningConfigBuilder;pub use pruning::PruningGranularity;pub use pruning::PruningMask;pub use pruning::PruningSchedule;pub use pruning::PruningStats;pub use pruning::PruningStrategy;pub use pruning::UnstructuredPruner;pub use pruning::WeightStatistics;pub use pruning::WeightTensor;pub use pruning::compute_channel_importance;pub use pruning::compute_gradient_importance;pub use pruning::compute_magnitude_importance;pub use pruning::compute_taylor_importance;pub use pruning::iterative_pruning;pub use pruning::prune_model;pub use pruning::prune_weights_direct;pub use pruning::prune_weights_with_gradients;pub use pruning::select_weights_to_prune;pub use pruning::structured_pruning;pub use pruning::unstructured_pruning;pub use quantization::QuantizationConfig;pub use quantization::QuantizationMode;pub use quantization::QuantizationParams;pub use quantization::QuantizationResult;pub use quantization::QuantizationType;pub use quantization::calibrate_quantization;pub use quantization::dequantize_tensor;pub use quantization::quantize_model;pub use quantization::quantize_tensor;
Modules§
- distillation
- Knowledge distillation for model compression
- pruning
- Model pruning for sparse neural networks
- quantization
- Model quantization for reduced precision inference
Structs§
- Optimization
Pipeline - Combined optimization pipeline
- Optimization
Stats - Model optimization statistics
Enums§
- Optimization
Profile - Model optimization profile