Skip to main content

Module elementwise

Module elementwise 

Source
Expand description

Elementwise GPU operations for OxiCUDA BLAS.

This module provides unary and binary elementwise operations over device buffers, including activation functions (ReLU, GELU, sigmoid, SiLU, tanh), scaling, and fused operations (add+relu, scale+add).

Each function generates PTX on the fly via oxicuda_ptx::templates::elementwise::ElementwiseTemplate from oxicuda-ptx, loads the resulting module, and launches the kernel on the handle’s stream.

Enums§

ElementwiseOp
Elementwise operation types supported by the BLAS elementwise module.

Functions§

abs_val
Computes the absolute value element-wise: output[i] = |input[i]|.
add
Element-wise addition: C[i] = A[i] + B[i].
broadcast_axes
Broadcasts src (a reduced tensor) back to dst (the full original shape) by replicating values along every axis listed in reduced_axes.
ceil
Computes the ceiling element-wise: output[i] = ceil(input[i]).
cmp_eq
Comparison equal: C[i] = (A[i] == B[i]) ? 1.0 : 0.0.
cmp_ge
Comparison greater-or-equal: C[i] = (A[i] >= B[i]) ? 1.0 : 0.0.
cmp_gt
Comparison greater-than: C[i] = (A[i] > B[i]) ? 1.0 : 0.0.
cmp_le
Comparison less-or-equal: C[i] = (A[i] <= B[i]) ? 1.0 : 0.0.
cmp_lt
Comparison less-than: C[i] = (A[i] < B[i]) ? 1.0 : 0.0.
cmp_ne
Comparison not-equal: C[i] = (A[i] != B[i]) ? 1.0 : 0.0.
div
Element-wise division: C[i] = A[i] / B[i].
exp
Computes the exponential element-wise: output[i] = exp(input[i]).
fill
Fills every element of dst[0..n] with value on the GPU.
floor
Computes the floor element-wise: output[i] = floor(input[i]).
fused_add_relu
Fused Add + ReLU: C[i] = max(0, A[i] + B[i]).
fused_scale_add
Fused Scale-Add: C[i] = alpha * A[i] + beta * B[i].
gelu
Applies the GELU activation element-wise (tanh approximation).
hard_sigmoid
Applies hard sigmoid element-wise: output[i] = max(0, min(1, 0.2*input[i] + 0.5)).
hard_swish
Applies hard swish element-wise: output[i] = input[i] * max(0, min(6, input[i]+3)) / 6.
leaky_relu
Applies leaky relu element-wise with alpha=0.01: output[i] = input[i] >= 0 ? input[i] : 0.01 * input[i].
log
Computes the natural logarithm element-wise: output[i] = ln(input[i]).
max
Element-wise maximum: C[i] = max(A[i], B[i]).
min
Element-wise minimum: C[i] = min(A[i], B[i]).
mul
Element-wise multiplication (Hadamard product): C[i] = A[i] * B[i].
nand
Fuzzy NAND: C[i] = 1 - A[i]*B[i].
neg
Negates every element: output[i] = -input[i].
nor
Fuzzy NOR: C[i] = 1 - (A[i] + B[i] - A[i]*B[i]).
one_minus
Applies one-minus element-wise: output[i] = 1 - input[i].
or_max
Fuzzy OR via max: C[i] = max(A[i], B[i]).
or_prob_sum
Probabilistic OR: C[i] = A[i] + B[i] - A[i]*B[i].
pow
Element-wise power: C[i] = A[i]^B[i].
relu
Applies the ReLU activation element-wise.
rsqrt
Computes the reciprocal square root element-wise: output[i] = 1 / sqrt(input[i]).
scale
Scales every element by a scalar: output[i] = alpha * input[i].
sigmoid
Applies the sigmoid activation element-wise.
silu
Applies the SiLU (Swish) activation element-wise.
softplus
Applies softplus element-wise: output[i] = ln(1 + exp(input[i])).
sqrt
Computes the square root element-wise: output[i] = sqrt(input[i]).
sub
Element-wise subtraction: C[i] = A[i] - B[i].
tanh_activation
Applies the hyperbolic tangent activation element-wise.
xor
Fuzzy XOR: C[i] = A[i] + B[i] - 2*A[i]*B[i].