Module neuronika::optim [−][src]
Expand description
Implementations of various optimization algorithms and penalty regularizations.
Some of the most commonly used methods are already supported, and the interface is linear enough, so that more sophisticated ones can be also easily integrated in the future. The complete list can be found here.
An optimizer holds a state, in the form of a representation, for each of the parameters to optimize and it updates them accordingly to both their gradient and the state itself.
Using an optimizer
The first step to be performed in order to use any optimizer is to construct it.
Constructing it
To construct an optimizer you have to pass it a vector of Param
referring to
the parameters you whish to optimize. Depending on the kind of optimizer you may also need to
pass several optimizer-specific setting such as the learning rate, the momentum, etc.
The optimization algorithms provided by neuronika are designed to work both with variables and neural networks.
use neuronika;
use neuronika::optim::{SGD, Adam, L1, L2};
let p = neuronika::rand(5).requires_grad();
let q = neuronika::rand(5).requires_grad();
let x = neuronika::rand(5);
let y = p * x + q;
let optim = SGD::new(y.parameters(), 0.01, L1::new(0.05));
let model = NeuralNetwork::new();
let model_optim = Adam::new(model.parameters(), 0.01, (0.9, 0.999), L2::new(0.01), 1e-8);
Taking an optimization step
All neuronika’s optimizer implement a .step()
method that updates the
parameters.
Implementing an optimizer
Implementing an optimizer in neuronika is quick and simple. The procedure consists in 3 steps:
-
Define its parameter’s representation struct and specify how to build it from
Param
. -
Define its struct.
-
Implement the
Optimizer
trait.
Let’s go through them by implementing the classic version of the stochastic gradient descent.
Firstly, we define the SGD parameter’s struct and the conversion from Param
.
use neuronika::Param;
use ndarray::{ArrayD, ArrayViewMutD};
struct SGDParam<'a> {
data: ArrayViewMutD<'a, f32>,
grad: ArrayViewMutD<'a, f32>,
}
impl<'a> From<Param<'a>> for SGDParam<'a> {
fn from(param: Param<'a>) -> Self {
let Param { data, grad } = param;
Self { data, grad }
}
}
Being a basic optimizer, the SGDParam
struct will only contain the gradient and the data views
for each of the parameters to optimize.
Nevertheless, do note that an optimizer’s parameter representation acts as a container for the additional information, such as adaptive learning rates and moments of any kind, that may be needed for the learning steps of more complex algorithms.
Then, we define the SGD’s struct.
use neuronika::Param;
use neuronika::optim::Penalty;
use std::cell::{Cell, RefCell};
struct SGD<'a, T> {
params: RefCell<Vec<SGDParam<'a>>>,
lr: Cell<f32>,
penalty: T,
}
Lastly, we implement Optimizer
for SGD
.
use ndarray::Zip;
use neuronika::optim::Optimizer;
use rayon::iter::{IntoParallelRefMutIterator, ParallelIterator};
impl<'a, T: Penalty> Optimizer<'a> for SGD<'a, T> {
type ParamRepr = SGDParam<'a>;
fn step(&self) {
let (lr, penalty) = (self.lr.get(), &self.penalty);
self.params.borrow_mut().par_iter_mut().for_each(|param| {
let (data, grad) = (&mut param.data, ¶m.grad);
Zip::from(data).and(grad).for_each(|data_el, grad_el| {
*data_el += -(grad_el + penalty.penalize(data_el)) * lr
});
});
}
fn zero_grad(&self) {
self.params.borrow_mut().par_iter_mut().for_each(|param| {
let grad = &mut param.grad;
Zip::from(grad).for_each(|grad_el| *grad_el = 0.);
});
}
fn get_lr(&self) -> f32 {
self.lr.get()
}
fn set_lr(&self, lr: f32) {
self.lr.set(lr)
}
}
/// Simple constructor.
impl<'a, T: Penalty> SGD<'a, T> {
pub fn new(parameters: Vec<Param<'a>>, lr: f32, penalty: T) -> Self {
Self {
params: RefCell::new(Self::build_params(parameters)),
lr: Cell::new(lr),
penalty,
}
}
}
Adjusting the learning rate
The lr_scheduler
module provides several methods to adjust the learning rate based on the
number of epochs.
Algorithms
List of all implemented optimizers.
Modules
Learning rate schedulers.
Structs
AMSGrad optimizer.
A parameter used by the AMSGrad optimizer.
Adagrad optimizer.
A parameter used by the Adagrad optimizer.
Adam optimizer.
A Parameter used by the Adam optimizer.
ElasticNet regularization, linearly combines the L1 and L2 penalties.
L1 penalty.
L2 penalty, also known as weight decay or Tichonov regularization.
RMSProp optimizer.
The RMSProp optimizer in its centered variant.
A parameter used by the centered RMSProp optimizer.
The centered RMSProp optimizer with momentum.
A parameter used by the centered RMSProp optimizer with momentum.
A parameter used by the RMSProp optimizer.
The RMSProp optimizer with momentum.
A parameter used by the RMSProp optimizer with momentum.
Stochastic Gradient Descent optimizer.
The parameter representation used by the SDG optimizer.
The momentum variant of the Stochastic Gradient Descent optimizer.
The parameter representation used by the SDG with momentum optimizer.