Struct vikos::teacher::Nesterov [] [src]

pub struct Nesterov {
    pub l0: f64,
    pub t: f64,
    pub inertia: f64,
}

Nesterov accelerated gradient descent

Like accelerated gradient descent, Nesterov accelerated gradient descent includes a momentum term. In contrast to regular gradient descent, the acceleration is not calculated with respect to the current position, but to the estimated new one. Source: G. Hinton's lecture 6c

Fields

Start learning rate

Smaller t will decrease the learning rate faster

After t events the start learning rate will be a half l0, after two t events the learning rate will be one third l0, and so on.

To simulate friction, please select a value smaller than 1 (recommended)

Trait Implementations

impl<M> Teacher<M> for Nesterov where M: Model
[src]

Contains state which changes during the training, but is not part of the expertise Read more

Creates an instance holding all mutable state of the algorithm

Changes models coefficents so they minimize the cost function (hopefully)