Module adamw

Expand description

AdamW (Adam with decoupled Weight Decay) optimizer

AdamW modifies the original Adam algorithm by decoupling weight decay from the gradient-based update. This leads to better generalization performance, especially in deep learning applications.