pub trait ActivationOps<R>where
R: Runtime,{
Show 19 methods
// Provided methods
fn relu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn sigmoid(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn silu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn gelu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn leaky_relu(
&self,
a: &Tensor<R>,
negative_slope: f64,
) -> Result<Tensor<R>, Error> { ... }
fn elu(&self, a: &Tensor<R>, alpha: f64) -> Result<Tensor<R>, Error> { ... }
fn softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error> { ... }
fn log_softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error> { ... }
fn softmax_bwd(
&self,
grad: &Tensor<R>,
output: &Tensor<R>,
dim: isize,
) -> Result<Tensor<R>, Error> { ... }
fn softplus(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn silu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn gelu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn relu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error> { ... }
fn sigmoid_mul(
&self,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<Tensor<R>, Error> { ... }
fn silu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error> { ... }
fn gelu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error> { ... }
fn relu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error> { ... }
fn sigmoid_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error> { ... }
fn dropout(
&self,
a: &Tensor<R>,
p: f64,
training: bool,
) -> Result<Tensor<R>, Error> { ... }
}Expand description
Activation operations
Provided Methods§
Sourcefn silu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
fn silu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
SiLU (Swish): a * sigmoid(a) = a / (1 + e^(-a))
Used in LLaMA, Mistral, and other modern transformer architectures.
Sourcefn gelu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
fn gelu(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
GELU (Gaussian Error Linear Unit): 0.5 * a * (1 + tanh(sqrt(2/pi) * (a + 0.044715 * a^3)))
Uses the tanh approximation. Used in GPT, BERT, and other transformer architectures.
Sourcefn leaky_relu(
&self,
a: &Tensor<R>,
negative_slope: f64,
) -> Result<Tensor<R>, Error>
fn leaky_relu( &self, a: &Tensor<R>, negative_slope: f64, ) -> Result<Tensor<R>, Error>
Leaky ReLU: max(negative_slope * a, a)
Allows small gradients for negative inputs, helping prevent “dying ReLU” problem. Default negative_slope is typically 0.01.
Sourcefn elu(&self, a: &Tensor<R>, alpha: f64) -> Result<Tensor<R>, Error>
fn elu(&self, a: &Tensor<R>, alpha: f64) -> Result<Tensor<R>, Error>
ELU (Exponential Linear Unit): a if a > 0, else alpha * (exp(a) - 1)
Smooth approximation to ReLU with negative values saturating to -alpha. Default alpha is typically 1.0.
Sourcefn softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>
fn softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>
Softmax along a dimension
Sourcefn log_softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>
fn log_softmax(&self, a: &Tensor<R>, dim: isize) -> Result<Tensor<R>, Error>
Log-softmax along a dimension: log(softmax(x, dim))
Computed as x - logsumexp(x, dim) for numerical stability.
Used in log-probability calculations, Bayesian inference,
categorical distributions, and information theory.
Sourcefn softmax_bwd(
&self,
grad: &Tensor<R>,
output: &Tensor<R>,
dim: isize,
) -> Result<Tensor<R>, Error>
fn softmax_bwd( &self, grad: &Tensor<R>, output: &Tensor<R>, dim: isize, ) -> Result<Tensor<R>, Error>
Softmax backward pass: computes gradient w.r.t. input given output gradient and softmax output.
Formula: d_input = output * (grad - sum(grad * output, dim, keepdim=true))
This is the Jacobian-vector product for softmax, used in training backward passes.
§Arguments
grad- Upstream gradient (same shape as output)output- The softmax output from the forward passdim- The dimension along which softmax was computed
Sourcefn softplus(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
fn softplus(&self, a: &Tensor<R>) -> Result<Tensor<R>, Error>
Softplus: log(1 + exp(a))
A smooth approximation to ReLU that is always positive and differentiable.
Used in Mamba2 for dt (step size) processing via softplus(dt_proj(x)) + dt_bias.
Gradient: sigmoid(a)
Sourcefn silu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
fn silu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
Fused SiLU-Mul: silu(a) * b in a single pass.
Computes (a / (1 + exp(-a))) * b element-wise with one memory pass
instead of two (activation + multiply). Used in SwiGLU and similar gated architectures.
Sourcefn gelu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
fn gelu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
Fused GELU-Mul: gelu(a) * b in a single pass.
Computes (0.5 * a * (1 + tanh(sqrt(2/pi) * (a + 0.044715*a^3)))) * b element-wise.
Used in GeGLU gated architectures.
Sourcefn relu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
fn relu_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
Fused ReLU-Mul: relu(a) * b in a single pass.
Computes max(0, a) * b element-wise. Used in ReGLU gated architectures.
Sourcefn sigmoid_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
fn sigmoid_mul(&self, a: &Tensor<R>, b: &Tensor<R>) -> Result<Tensor<R>, Error>
Fused Sigmoid-Mul: sigmoid(a) * b in a single pass.
Computes (1 / (1 + exp(-a))) * b element-wise. Used in SiGLU gated architectures.
Sourcefn silu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error>
fn silu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>
Fused SiLU-Mul backward: computes gradients for output = silu(a) * b.
Returns (d_a, d_b) where:
d_a = grad * b * silu'(a)withsilu'(x) = sigmoid(x) * (1 + x - silu(x))d_b = grad * silu(a)
Backends may implement this as a single fused kernel for better performance.
Sourcefn gelu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error>
fn gelu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>
Fused GELU-Mul backward: computes gradients for output = gelu(a) * b.
Returns (d_a, d_b) where:
d_a = grad * b * gelu'(a)d_b = grad * gelu(a)
Sourcefn relu_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error>
fn relu_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>
Fused ReLU-Mul backward: computes gradients for output = relu(a) * b.
Returns (d_a, d_b) where:
d_a = grad * b * relu'(a)withrelu'(x) = 1 if x > 0, else 0d_b = grad * relu(a)
Sourcefn sigmoid_mul_bwd(
&self,
grad: &Tensor<R>,
a: &Tensor<R>,
b: &Tensor<R>,
) -> Result<(Tensor<R>, Tensor<R>), Error>
fn sigmoid_mul_bwd( &self, grad: &Tensor<R>, a: &Tensor<R>, b: &Tensor<R>, ) -> Result<(Tensor<R>, Tensor<R>), Error>
Fused Sigmoid-Mul backward: computes gradients for output = sigmoid(a) * b.
Returns (d_a, d_b) where:
d_a = grad * b * sigmoid'(a)withsigmoid'(x) = sigmoid(x) * (1 - sigmoid(x))d_b = grad * sigmoid(a)
Sourcefn dropout(
&self,
a: &Tensor<R>,
p: f64,
training: bool,
) -> Result<Tensor<R>, Error>
fn dropout( &self, a: &Tensor<R>, p: f64, training: bool, ) -> Result<Tensor<R>, Error>
Dropout: randomly zero elements with probability p during training.
When training is true, each element is independently zeroed with probability p,
and remaining elements are scaled by 1/(1-p) to maintain expected values.
When training is false, returns the input unchanged.
Used in regularization, Monte Carlo dropout, and Bayesian approximation.
Implementors§
impl ActivationOps<CpuRuntime> for CpuClient
ActivationOps implementation for CPU runtime.