Module adam

Expand description

ADAM (Adaptive Moment Estimation) optimizer

ADAM combines the advantages of two other extensions of stochastic gradient descent: AdaGrad and RMSProp. It computes adaptive learning rates for each parameter and stores an exponentially decaying average of past gradients (momentum) and past squared gradients (adaptive learning rate).