Expand description
ADAM (Adaptive Moment Estimation) optimizer
ADAM combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp. It computes adaptive learning rates for each parameter and stores an exponentially decaying average of past gradients (momentum) and past squared gradients (adaptive learning rate).
Structs§
- Adam
Options - Options for ADAM optimization
Functions§
- minimize_
adam - ADAM optimizer implementation
- minimize_
adam_ with_ warmup - ADAM with learning rate warmup