pub fn adaptive_gradient_clipping<'a, A, D>(
gradients: &'a mut Array<A, D>,
parameters: &Array<A, D>,
max_ratio: A,
) -> Result<&'a mut Array<A, D>>Expand description
Adaptive gradient clipping
Clips gradients based on the ratio of gradient norm to parameter norm. This is particularly useful for transformer models.