Skip to main content

Module distillation

aprender::online

Module distillation

Expand description

Knowledge Distillation for Model Compression

Transfer knowledge from large teacher models to small student models using soft targets (probabilities) rather than hard labels.

§References

[Hinton et al. 2015] “Distilling the Knowledge in a Neural Network”

§Toyota Way Principles

Muda Elimination: Compress models to eliminate resource waste
Standardization: Consistent soft-target training process

Structs§

DistillationConfig: Configuration for knowledge distillation
DistillationLoss: Knowledge distillation loss calculator
DistillationResult: Distillation training result
LinearDistiller: Simple linear distillation model (for testing/simple cases)
SoftTargetGenerator: Soft target generator from logits

Constants§

DEFAULT_ALPHA: Default alpha (weight for distillation loss vs hard label loss)
DEFAULT_TEMPERATURE: Default distillation temperature (recommended by review)

Functions§

binary_cross_entropy: Binary cross-entropy for single-class prediction
cross_entropy: Cross-entropy loss: CE(p, y) = -sum(y * log(p))
kl_divergence: KL divergence: D_KL(P || Q) = sum(P * log(P/Q))
softmax: Regular softmax (T=1)
softmax_temperature: Softmax with temperature scaling