Module adamw

Source
Expand description

AdamW (Adam with decoupled Weight Decay) optimizer

AdamW modifies the original Adam algorithm by decoupling weight decay from the gradient-based update. This leads to better generalization performance, especially in deep learning applications.

Structs§

AdamWOptions
Options for AdamW optimization

Functions§

minimize_adamw
AdamW optimizer implementation
minimize_adamw_cosine_restarts
AdamW with cosine annealing and restarts