Expand description
AdamW (Adam with decoupled Weight Decay) optimizer
AdamW modifies the original Adam algorithm by decoupling weight decay from the gradient-based update. This leads to better generalization performance, especially in deep learning applications.
Structs§
- AdamW
Options - Options for AdamW optimization
Functions§
- minimize_
adamw - AdamW optimizer implementation
- minimize_
adamw_ cosine_ restarts - AdamW with cosine annealing and restarts