Skip to main content

ActivationOps

Trait ActivationOps 

Source
pub trait ActivationOps<R>
where R: Runtime,
{
Show 19 methods // Provided methods fn relu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn sigmoid(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn silu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn gelu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn leaky_relu( &self, a: &Tensor<R>, negative_slope: f64, ) -> Result<Tensor<R>, Error> { ... } fn elu(&self, a: &Tensor<R>, alpha: f64) -> Result<Tensor<R>, Error> { ... } fn softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error> { ... } fn log_softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error> { ... } fn softmax_bwd( &self, grad: &Tensor<R>, output: &Tensor<R>, dim: isize, ) -> Result<Tensor<R>, Error> { ... } fn softplus(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn silu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn gelu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn relu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... } fn sigmoid_mul( &self, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<Tensor<R>, Error> { ... } fn silu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error> { ... } fn gelu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error> { ... } fn relu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error> { ... } fn sigmoid_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error> { ... } fn dropout( &self, a: &Tensor<R>, p: f64, training: bool, ) -> Result<Tensor<R>, Error> { ... }
}
Expand description

Activation operations

Provided Methods§

Source

fn relu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>

Rectified linear unit: max(0, a)

Source

fn sigmoid(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>

Sigmoid: 1 / (1 + e^(-a))

Source

fn silu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>

SiLU (Swish): a * sigmoid(a) = a / (1 + e^(-a))

Used in LLaMA, Mistral, and other modern transformer architectures.

Source

fn gelu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>

GELU (Gaussian Error Linear Unit): 0.5 * a * (1 + tanh(sqrt(2/pi) * (a + 0.044715 * a^3)))

Uses the tanh approximation. Used in GPT, BERT, and other transformer architectures.

Source

fn leaky_relu( &self, a: &Tensor<R>, negative_slope: f64, ) -> Result<Tensor<R>, Error>

Leaky ReLU: max(negative_slope * a, a)

Allows small gradients for negative inputs, helping prevent “dying ReLU” problem. Default negative_slope is typically 0.01.

Source

fn elu(&self, a: &Tensor<R>, alpha: f64) -> Result<Tensor<R>, Error>

ELU (Exponential Linear Unit): a if a > 0, else alpha * (exp(a) - 1)

Smooth approximation to ReLU with negative values saturating to -alpha. Default alpha is typically 1.0.

Source

fn softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>

Softmax along a dimension

Source

fn log_softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>

Log-softmax along a dimension: log(softmax(x, dim))

Computed as x - logsumexp(x, dim) for numerical stability. Used in log-probability calculations, Bayesian inference, categorical distributions, and information theory.

Source

fn softmax_bwd( &self, grad: &Tensor<R>, output: &Tensor<R>, dim: isize, ) -> Result<Tensor<R>, Error>

Softmax backward pass: computes gradient w.r.t. input given output gradient and softmax output.

Formula: d_input = output * (grad - sum(grad * output, dim, keepdim=true))

This is the Jacobian-vector product for softmax, used in training backward passes.

§Arguments
  • grad - Upstream gradient (same shape as output)
  • output - The softmax output from the forward pass
  • dim - The dimension along which softmax was computed
Source

fn softplus(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>

Softplus: log(1 + exp(a))

A smooth approximation to ReLU that is always positive and differentiable. Used in Mamba2 for dt (step size) processing via softplus(dt_proj(x)) + dt_bias.

Gradient: sigmoid(a)

Source

fn silu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>

Fused SiLU-Mul: silu(a) * b in a single pass.

Computes (a / (1 + exp(-a))) * b element-wise with one memory pass instead of two (activation + multiply). Used in SwiGLU and similar gated architectures.

Source

fn gelu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>

Fused GELU-Mul: gelu(a) * b in a single pass.

Computes (0.5 * a * (1 + tanh(sqrt(2/pi) * (a + 0.044715*a^3)))) * b element-wise. Used in GeGLU gated architectures.

Source

fn relu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>

Fused ReLU-Mul: relu(a) * b in a single pass.

Computes max(0, a) * b element-wise. Used in ReGLU gated architectures.

Source

fn sigmoid_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>

Fused Sigmoid-Mul: sigmoid(a) * b in a single pass.

Computes (1 / (1 + exp(-a))) * b element-wise. Used in SiGLU gated architectures.

Source

fn silu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>

Fused SiLU-Mul backward: computes gradients for output = silu(a) * b.

Returns (d_a, d_b) where:

  • d_a = grad * b * silu'(a) with silu'(x) = sigmoid(x) * (1 + x - silu(x))
  • d_b = grad * silu(a)

Backends may implement this as a single fused kernel for better performance.

Source

fn gelu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>

Fused GELU-Mul backward: computes gradients for output = gelu(a) * b.

Returns (d_a, d_b) where:

  • d_a = grad * b * gelu'(a)
  • d_b = grad * gelu(a)
Source

fn relu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>

Fused ReLU-Mul backward: computes gradients for output = relu(a) * b.

Returns (d_a, d_b) where:

  • d_a = grad * b * relu'(a) with relu'(x) = 1 if x > 0, else 0
  • d_b = grad * relu(a)
Source

fn sigmoid_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>

Fused Sigmoid-Mul backward: computes gradients for output = sigmoid(a) * b.

Returns (d_a, d_b) where:

  • d_a = grad * b * sigmoid'(a) with sigmoid'(x) = sigmoid(x) * (1 - sigmoid(x))
  • d_b = grad * sigmoid(a)
Source

fn dropout( &self, a: &Tensor<R>, p: f64, training: bool, ) -> Result<Tensor<R>, Error>

Dropout: randomly zero elements with probability p during training.

When training is true, each element is independently zeroed with probability p, and remaining elements are scaled by 1/(1-p) to maintain expected values. When training is false, returns the input unchanged.

Used in regularization, Monte Carlo dropout, and Bayesian approximation.

Implementors§

Source§

impl ActivationOps<CpuRuntime> for CpuClient

ActivationOps implementation for CPU runtime.