pub struct Optimizer {
pub learning_rate: f32,
pub beta_momentum: f32,
pub beta_magnitude: f32,
pub epsilon: f32,
/* private fields */
}
Expand description
Here is a good blog that explains various optimizers.
Currently only SGD, RMSProp, Adam, and SGD-with-momentum are implemented.
The Optimizer
struct builds and holds OptimizerInstance
s which
hold runtime information about every parameter that’s being optimized.
If beta_momentum
or beta_magnitude
are set to zero, then the optimizer does not keep
momentum and magnitude correction information information about parameters.
epsilon
is added to denominators to avoid divide by zero errors.
no beta_momentum | beta_momentum | |
---|---|---|
no beta_magnitude | vanilla SGD | SGD with momentum |
beta_magnitude | RMSProp | Adam |
Fields§
§learning_rate: f32
§beta_momentum: f32
§beta_magnitude: f32
§epsilon: f32
Implementations§
source§impl Optimizer
impl Optimizer
pub fn new(
learning_rate: f32,
beta_momentum: f32,
beta_magnitude: f32,
epsilon: f32
) -> Self
sourcepub fn sgd_default() -> Self
pub fn sgd_default() -> Self
Vanilla stochastic gradient descent with no added fluff.
sourcepub fn momentum_default() -> Self
pub fn momentum_default() -> Self
SGD with a momentum component. Add the geometric average of past gradients to the parameter instead of the gradient itself. This averaging dampens the stochasticity of the stochastic gradient descent.
sourcepub fn rmsprop_default() -> Self
pub fn rmsprop_default() -> Self
SGD with a magnitude component. Rescale gradients by dividing by the geometric mean of previous gradients squared. Parameters with frequent large gradients will see those gradients shrink while parameters with sparse gradients will have their gradients grow.
sourcepub fn adam_default() -> Self
pub fn adam_default() -> Self
Adam (Adaptive Moment Estimation) Combines the momentum component from momentum
and the
magnitude component from rmsprop
.
pub fn register(&mut self, i: Idx, shape: &[usize])
sourcepub fn apply_gradient(
&mut self,
i: Idx,
param: ArrayViewMutD<'_, f32>,
grad: &ArrayD<f32>
)
pub fn apply_gradient(
&mut self,
i: Idx,
param: ArrayViewMutD<'_, f32>,
grad: &ArrayD<f32>
)
Apply gradient