Function dfdx::tensor_ops::accurate_gelu

source ·
pub fn accurate_gelu<S: Shape, E: Dtype, D: UnaryKernel<AccurateGeLUKernelOp, E>, T: Tape<E, D>>(
    t: Tensor<S, E, D, T>
) -> Tensor<S, E, D, T>
Expand description

Accurate Gaussian Linear Unit (GeLU). This is defined as x * Phi(x) where Phi(x) is the cumulative distribution function of a standard normal distribution. This can be calculated via the Error Function erf(x) using

0.5 * x * (1.0 + erf(x / 2.0.sqrt()))

As an accurate error function is computationally expensive it is possible to approximate the Gaussian Linear Unit with a hyperbolic tangent function tanh

GeLU(x) ~ 0.5 ∗ x ∗ (1.0 + tanh((sqrt(2.0/π) ∗ (x + 0.044715 ∗ x^3)))

See fast_gelu to use this approximation

Examples:

let t = dev.tensor([-1.0, 0.0, 1.0, 2.0]);
let r = t.accurate_gelu();