Skip to main content

Module distillation

Module distillation 

Source
Expand description

Knowledge Distillation for Model Compression

Transfer knowledge from large teacher models to small student models using soft targets (probabilities) rather than hard labels.

§References

  • [Hinton et al. 2015] “Distilling the Knowledge in a Neural Network”

§Toyota Way Principles

  • Muda Elimination: Compress models to eliminate resource waste
  • Standardization: Consistent soft-target training process

Structs§

DistillationConfig
Configuration for knowledge distillation
DistillationLoss
Knowledge distillation loss calculator
DistillationResult
Distillation training result
LinearDistiller
Simple linear distillation model (for testing/simple cases)
SoftTargetGenerator
Soft target generator from logits

Constants§

DEFAULT_ALPHA
Default alpha (weight for distillation loss vs hard label loss)
DEFAULT_TEMPERATURE
Default distillation temperature (recommended by review)

Functions§

binary_cross_entropy
Binary cross-entropy for single-class prediction
cross_entropy
Cross-entropy loss: CE(p, y) = -sum(y * log(p))
kl_divergence
KL divergence: D_KL(P || Q) = sum(P * log(P/Q))
softmax
Regular softmax (T=1)
softmax_temperature
Softmax with temperature scaling