Struct multistochgrad::scsg::StochasticControlledGradientDescent [−][src]
pub struct StochasticControlledGradientDescent { /* fields omitted */ }
Expand description
Provides Stochastic Controlled Gradient Descent optimization based on 2 papers of Lei-Jordan.
"On the adaptativity of stochastic gradient based optimisation"
arxiv 2019 SCSG-1"Less than a single pass : stochastically controlled stochastic gradient"
arxiv 2019 SCSG-2
According to the first paper we have the following notations:
One iteration j consists in :
- a large batch of size Bⱼ
- a number noted mⱼ of small batches of size bⱼ
- update position with a step ηⱼ. The number of mini batch is described by a random variable with a geometric law.
The paper establishes rates of convergence depending on the ratio mⱼ/Bⱼ , bⱼ/mⱼ and ηⱼ/bⱼ and their products.
The second paper :
“Less than a single pass : stochastically controlled stochastic gradient”
describes a simplified version where the mini batches consist in just one term
and the number of mini batch is set to the mean of the geometric variable corresponding to
number of mini batches.
We adopt a mix of the two papers: It seems that letting the size of mini batch grow a little is more stable than keeping it to 1. (in particular when initialization of the algorithm varies.) but replacing the geometric law by its mean is really more stable due to the large variance of its law.
If nbterms is the number of terms in function to minimize and j the iteration number:
- Bⱼ evolves as : large_batch_size_init * nbterms * alfa^(2j)
- mⱼ evolves as : m_zero * nbterms * alfa^(3j/2)
- bⱼ evolves as : b_0 * alfa^j
- ηⱼ evolves as : eta_0 / alfa^(j/2)
where alfa is computed to be slightly greater than 1.
In fact α is chosen so that : B_0 * alfa^(2*nbiter) = 1.
The evolution of Bⱼ is bounded above by nbterms/10 and bⱼ by nbterms/100.
The size of small batch must stay small so b₀ must be small (typically 1 seems OK)
Implementations
impl StochasticControlledGradientDescent
[src]
impl StochasticControlledGradientDescent
[src]pub fn new(
eta_zero: f64,
m_zero: f64,
mini_batch_size_init: usize,
large_batch_size_init: f64
) -> StochasticControlledGradientDescent
[src]
pub fn new(
eta_zero: f64,
m_zero: f64,
mini_batch_size_init: usize,
large_batch_size_init: f64
) -> StochasticControlledGradientDescent
[src]args are :
- initial value of step along gradient value of 0.5 is a good default choice.
- m_zero : a good value is 0.2 *large_batch_size_init so that mⱼ << Bⱼ
- base value for size of mini_batchs : a value of 1 is a good default choice
- fraction of nbterms to initialize large batch size : a good default value is between 0.01 and 0.02 large batch size begins at 0.01 * nbterms or 0.02 * nbterms.
(see examples)
Trait Implementations
impl<D: Dimension, F: SummationC1<D>> Minimizer<D, F, usize> for StochasticControlledGradientDescent
[src]
impl<D: Dimension, F: SummationC1<D>> Minimizer<D, F, usize> for StochasticControlledGradientDescent
[src]Auto Trait Implementations
impl RefUnwindSafe for StochasticControlledGradientDescent
impl Send for StochasticControlledGradientDescent
impl Sync for StochasticControlledGradientDescent
impl Unpin for StochasticControlledGradientDescent
impl UnwindSafe for StochasticControlledGradientDescent
Blanket Implementations
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]
impl<T> BorrowMut<T> for T where
T: ?Sized,
[src]pub fn borrow_mut(&mut self) -> &mut T
[src]
pub fn borrow_mut(&mut self) -> &mut T
[src]Mutably borrows from an owned value. Read more
impl<T> Pointable for T
impl<T> Pointable for T
impl<V, T> VZip<V> for T where
V: MultiLane<T>,
impl<V, T> VZip<V> for T where
V: MultiLane<T>,