Skip to main content

Module optimization

Module optimization 

Source
Expand description

Model optimization techniques for efficient inference

This module provides various model optimization techniques to reduce model size, improve inference speed, and reduce memory consumption while maintaining accuracy.

§Techniques

  • Quantization: Reduce precision (FP32 -> INT8/FP16)
  • Pruning: Remove unnecessary weights and connections
  • Knowledge Distillation: Transfer knowledge from large to small models
  • Model Compression: GZIP, Huffman coding, weight sharing

§Example

use oxigdal_ml::optimization::quantize_model;
use oxigdal_ml::optimization::{QuantizationConfig, QuantizationType};

let config = QuantizationConfig::builder()
    .quantization_type(QuantizationType::Int8)
    .per_channel(true)
    .build();

quantize_model("model.onnx", "model_quantized.onnx", &config)?;

Re-exports§

pub use distillation::DenseLayer;
pub use distillation::DistillationConfig;
pub use distillation::DistillationConfigBuilder;
pub use distillation::DistillationLoss;
pub use distillation::DistillationStats;
pub use distillation::DistillationTrainer;
pub use distillation::EarlyStopping;
pub use distillation::ForwardCache;
pub use distillation::LearningRateSchedule;
pub use distillation::MLPGradients;
pub use distillation::OptimizerType;
pub use distillation::SimpleMLP;
pub use distillation::SimpleRng;
pub use distillation::Temperature;
pub use distillation::TrainingState;
pub use distillation::cross_entropy_loss;
pub use distillation::cross_entropy_with_label;
pub use distillation::kl_divergence;
pub use distillation::kl_divergence_from_logits;
pub use distillation::log_softmax;
pub use distillation::mse_loss;
pub use distillation::soft_targets;
pub use distillation::softmax;
pub use distillation::train_student_model;
pub use pruning::FineTuneCallback;
pub use pruning::GradientInfo;
pub use pruning::ImportanceMethod;
pub use pruning::LotteryTicketState;
pub use pruning::MaskCreationMode;
pub use pruning::NoOpFineTune;
pub use pruning::PruningConfig;
pub use pruning::PruningConfigBuilder;
pub use pruning::PruningGranularity;
pub use pruning::PruningMask;
pub use pruning::PruningSchedule;
pub use pruning::PruningStats;
pub use pruning::PruningStrategy;
pub use pruning::UnstructuredPruner;
pub use pruning::WeightStatistics;
pub use pruning::WeightTensor;
pub use pruning::compute_channel_importance;
pub use pruning::compute_gradient_importance;
pub use pruning::compute_magnitude_importance;
pub use pruning::compute_taylor_importance;
pub use pruning::iterative_pruning;
pub use pruning::prune_model;
pub use pruning::prune_weights_direct;
pub use pruning::prune_weights_with_gradients;
pub use pruning::select_weights_to_prune;
pub use pruning::structured_pruning;
pub use pruning::unstructured_pruning;
pub use quantization::QuantizationConfig;
pub use quantization::QuantizationMode;
pub use quantization::QuantizationParams;
pub use quantization::QuantizationResult;
pub use quantization::QuantizationType;
pub use quantization::calibrate_quantization;
pub use quantization::dequantize_tensor;
pub use quantization::quantize_model;
pub use quantization::quantize_tensor;

Modules§

distillation
Knowledge distillation for model compression
pruning
Model pruning for sparse neural networks
quantization
Model quantization for reduced precision inference

Structs§

OptimizationPipeline
Combined optimization pipeline
OptimizationStats
Model optimization statistics

Enums§

OptimizationProfile
Model optimization profile