#[non_exhaustive]#[repr(u16)]pub enum UnaryKind {
Show 67 variants
Neg = 0,
Abs = 1,
Sign = 2,
Reciprocal = 3,
Square = 4,
Cube = 5,
Sqrt = 10,
Rsqrt = 11,
Cbrt = 12,
Exp = 20,
Exp2 = 21,
Expm1 = 22,
Log = 23,
Log2 = 24,
Log10 = 25,
Log1p = 26,
Sin = 30,
Cos = 31,
Tan = 32,
Asin = 33,
Acos = 34,
Atan = 35,
Sinh = 40,
Cosh = 41,
Tanh = 42,
Asinh = 43,
Acosh = 44,
Atanh = 45,
Floor = 50,
Ceil = 51,
Round = 52,
Trunc = 53,
Frac = 54,
Erf = 60,
Erfc = 61,
Erfinv = 62,
Lgamma = 63,
Digamma = 64,
BitwiseNot = 70,
Popcount = 71,
Clz = 72,
Ctz = 73,
Relu = 100,
Gelu = 101,
GeluTanh = 102,
Silu = 103,
Mish = 104,
Sigmoid = 105,
Logit = 106,
Softplus = 107,
Softsign = 108,
Tanhshrink = 109,
Relu6 = 110,
Hardswish = 111,
Hardsigmoid = 112,
Hardtanh = 113,
Selu = 114,
LeakyRelu = 115,
Elu = 116,
Hardshrink = 117,
Softshrink = 118,
Threshold = 119,
PReLU = 120,
PowI = 121,
Step = 122,
Cast = 130,
Affine = 131,
}Expand description
Unary elementwise op discriminant.
Stored as u16 in crate::KernelSku::op when
category == OpCategory::UnaryElementwise. Variants correspond to
the union of PyTorch (torch.<op> / torch.Tensor.<op>) and JAX
(jax.numpy.<op> / jax.lax.<op>) unary elementwise ops, plus the
activation family from PyTorch nn.functional.
Today only Self::Neg is wired — the Phase 3 unary trailblazer
SKU. The other variants are reserved discriminants for the fanout
sessions that ship the math (abs / sqrt / exp / log / sin / …) and
activation (relu / gelu / silu / …) families.
Ops that return a different dtype than the input (isnan, isinf,
isfinite, logical_not) are reserved here but will route through
a future UnaryToBoolPlan (or similar) with a distinct output type
— not through this enum’s UnaryPlan<T, N>.
Parameterized activations (leaky_relu(α), elu(α), threshold(t, v),
hardshrink(λ), softshrink(λ)) carry their parameters via a
UnaryParams field on the descriptor — landed when the first
parameterized op ships, omitted for the trailblazer.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Neg = 0
y = -x — elementwise negation. Trailblazer SKU.
Abs = 1
y = |x| — elementwise absolute value.
Sign = 2
y = sign(x) — -1 / 0 / +1 per the input’s sign.
Reciprocal = 3
y = 1 / x — elementwise reciprocal.
Square = 4
y = x * x — elementwise square.
Cube = 5
y = x * x * x — elementwise cube.
Sqrt = 10
y = sqrt(x).
Rsqrt = 11
y = 1 / sqrt(x) — reciprocal square root.
Cbrt = 12
y = cbrt(x) — cube root.
Exp = 20
y = exp(x).
Exp2 = 21
y = 2^x.
Expm1 = 22
y = exp(x) - 1.
Log = 23
y = ln(x) — natural log.
Log2 = 24
y = log_2(x).
Log10 = 25
y = log_10(x).
Log1p = 26
y = ln(1 + x).
Sin = 30
y = sin(x).
Cos = 31
y = cos(x).
Tan = 32
y = tan(x).
Asin = 33
y = asin(x).
Acos = 34
y = acos(x).
Atan = 35
y = atan(x).
Sinh = 40
y = sinh(x).
Cosh = 41
y = cosh(x).
Tanh = 42
y = tanh(x).
Asinh = 43
y = asinh(x).
Acosh = 44
y = acosh(x).
Atanh = 45
y = atanh(x).
Floor = 50
y = floor(x).
Ceil = 51
y = ceil(x).
Round = 52
y = round(x) — round-half-to-even (PyTorch convention).
Trunc = 53
y = trunc(x) — truncate toward zero.
Frac = 54
y = x - trunc(x) — fractional part with sign of x.
Erf = 60
y = erf(x).
Erfc = 61
y = erfc(x) = 1 - erf(x).
Erfinv = 62
y = erfinv(x).
Lgamma = 63
y = lgamma(x) = ln(|Γ(x)|).
Digamma = 64
y = digamma(x) = Γ'(x) / Γ(x).
BitwiseNot = 70
y = ~x — bitwise NOT (integer dtypes).
Popcount = 71
y = popcount(x) — population count of set bits (integer).
Clz = 72
y = clz(x) — count leading zeros (integer).
Ctz = 73
y = ctz(x) — count trailing zeros (integer).
Relu = 100
y = relu(x) = max(x, 0).
Gelu = 101
y = gelu(x) — ERF-EXACT Gaussian Error Linear Unit,
0.5·x·(1+erf(x/√2)) — NOT the tanh approximation (that’s
Self::GeluTanh). The sys-level unary_gelu_erf_* symbols
are a bit-identical alias of the unary_gelu_* symbols this
variant dispatches to.
GeluTanh = 102
y = gelu_tanh(x) — tanh APPROXIMATION of gelu,
0.5·x·(1+tanh(√(2/π)·(x+0.044715·x³))). Diverges from the
erf-exact Self::Gelu by up to ~1e-4.
Silu = 103
y = silu(x) = x · sigmoid(x). Also known as Swish-1.
Mish = 104
y = mish(x) = x · tanh(softplus(x)).
Sigmoid = 105
y = sigmoid(x) = 1 / (1 + exp(-x)).
Logit = 106
y = logit(x) = log(x / (1 - x)). Inverse of sigmoid.
Softplus = 107
y = softplus(x) = ln(1 + exp(x)).
Softsign = 108
y = softsign(x) = x / (1 + |x|).
Tanhshrink = 109
y = tanhshrink(x) = x - tanh(x).
Relu6 = 110
y = relu6(x) = min(max(x, 0), 6).
Hardswish = 111
y = hardswish(x) — piecewise-linear approximation of swish.
Hardsigmoid = 112
y = hardsigmoid(x) — piecewise-linear approximation of sigmoid.
Hardtanh = 113
y = hardtanh(x, -1, +1) — piecewise-linear clamp.
Selu = 114
y = selu(x) — scaled exponential linear unit.
LeakyRelu = 115
y = leaky_relu(x) = x if x > 0 else α·x. Hardcoded α = 0.01 in
the current bespoke kernel; will re-emit as a fanout from a
parameterized-unary plan once that infrastructure lands.
Elu = 116
y = elu(x) = x if x > 0 else α·(exp(x) - 1). Hardcoded α = 1.0
in the current bespoke kernel; same parameterization story as
LeakyRelu.
Hardshrink = 117
y = hardshrink(x) = x if |x| > λ else 0. Hardcoded λ = 0.5 in
the current bespoke kernel; same parameterization story as
LeakyRelu.
Softshrink = 118
y = softshrink(x) = x - λ if x > λ; x + λ if x < -λ; else 0.
Hardcoded λ = 0.5 in the current bespoke kernel; same
parameterization story as LeakyRelu.
Threshold = 119
Reserved — threshold(x; t, v) = x if x > t else v. Needs the
parameterized-unary plan (two scalar parameters); not wired yet.
PReLU = 120
prelu(x; α) = x if x > 0 else α·x with per-channel learnable α
vector (or single scalar α). Uses a distinct plan shape
(PReluPlan / PReluBackwardPlan) because α is a tensor operand,
not a scalar parameter. Wired in Milestone 5.3.
PowI = 121
powi(x; n) = x^n for a fixed runtime integer exponent n.
Distinct from the generic BinaryKind::Pow (which takes an
f32 exponent tensor) because the integer-only path can use
power-by-squaring — faster than __expf(n · __logf(x)) and
also well-defined for negative x (real pow(-1.5, 2) = 2.25,
no NaN). The exponent is threaded via the params: [f32; 2]
slot 0 with a host-side cast (n as f32); slot 1 is unused.
Reasonable |n| values round-trip through f32 exactly (≤ 2^24).
Phase 12.1 wires {f32, f16, bf16, f64} through UnaryParamPlan.
Step = 122
y = step(x) = 1 if x > 0 else 0 — Heaviside step function.
step(0) = 0 and step(-0.0) = 0 (x > 0 is false at both
zeros); NaN → 0 (NaN > 0 is false), matching PyTorch’s
heaviside(x, values=0) for the > branch. Wires the Phase 31
unary_step_* kernels.
Cast = 130
y = (TOut) x — dtype conversion. Heterogeneous input / output
element types, so it goes through its own CastPlan (not the
same-dtype UnaryPlan<T, N>). The discriminant lives here for
telemetry / SKU-tagging consistency with the rest of the unary
family. Wired from fuel-cuda-kernels/cast.cu.
Affine = 131
y = a * x + b — fused affine (multiply-add) with scalar
parameters a / b. Same-dtype input/output but carries two
scalar parameters, so it gets its own AffinePlan (the unified
UnaryPlan<T, N> doesn’t carry kernel parameters). Wired from
fuel-cuda-kernels/affine.cu.