adaptive_gradient_clipping

Function adaptive_gradient_clipping 

Source
pub fn adaptive_gradient_clipping<'a, A, D>(
    gradients: &'a mut Array<A, D>,
    parameters: &Array<A, D>,
    max_ratio: A,
) -> Result<&'a mut Array<A, D>>
where A: Float + ScalarOperand, D: Dimension,
Expand description

Adaptive gradient clipping

Clips gradients based on the ratio of gradient norm to parameter norm. This is particularly useful for transformer models.