pub enum Decay {
WeightDecay(f64),
DecoupledWeightDecay(f64),
}Expand description
Method of weight decay to use
Variants§
WeightDecay(f64)
Weight decay regularisation to penalise large weights
The gradient is transformed as $$ g_{t} \gets g_{t} + \lambda \theta_{t-1}$$
This is equivalent to an L2 regularisation term in the loss adding $\frac{\lambda}{2}||\theta||_{2}^{2}$ but avoids autodifferentiation of the L2 term
DecoupledWeightDecay(f64)
Decoupled weight decay as described in Decoupled Weight Decay Regularization
This directly decays the weights as
$$ \theta_{t} \gets (1 - \eta \lambda) \theta_{t-1}$$
This is equivalent to regularisation, only for SGD without momentum, but is different for adaptive gradient methods
Trait Implementations§
Source§impl PartialOrd for Decay
impl PartialOrd for Decay
impl Copy for Decay
impl StructuralPartialEq for Decay
Auto Trait Implementations§
impl Freeze for Decay
impl RefUnwindSafe for Decay
impl Send for Decay
impl Sync for Decay
impl Unpin for Decay
impl UnwindSafe for Decay
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more