pub trait Element: KernelDtype + Sealed {
type Scalar: ScalarType;
}Expand description
Element types supported by the kernel facade.
Sealed to prevent downstream impls — adding a new dtype requires
shipping a new kernel instantiation in the corresponding *-kernels-sys
crate.
The trait spans three families that share the <T: Element>-
parameterized plan shape but route through distinct kernel SKUs:
- Floating-point:
f16,bf16,f32,F32Strict,f64.f32reduces through TF32 tensor cores (10-bit mantissa);F32Strictuses SIMT CUDA cores at full IEEE 754 binary32 with bit-stable results. TheScalarprojection isf32for the 16-bit / 32-bit float members andf64forf64. - Integer:
i32,i64. Used for elementwise integer arithmetic (bitwise ops, integer comparison). TheScalarprojection isf32— these types don’t participate in α/β-scaled epilogues, so the projection is nominal. Note:S8/U8/S4/U4are GEMM-only operand types and live on the separateIntElementtrait — they don’t implementElement. - Boolean:
Bool(1-byte storage, 0/non-zero truthiness). Used for logical ops and as the output type of comparison ops. TheScalarprojection isf32(also nominal).
Sibling traits IntElement, FpElement, BinElement, and
BiasElement cover GEMM-only / FP8 / packed-bit / bias-broadcast
types respectively; those have their own kernel families and don’t
route through <T: Element>-parameterized elementwise plans. The
umbrella KernelDtype supertrait covers the union of Element
IntElement+FpElement+BinElement.
§KIND lookup
Element does NOT redeclare const KIND; the const is inherited
from the KernelDtype supertrait. This keeps T::KIND unambiguous
at every call site under <T: Element> bounds. Pre-Phase-28 code
using the fully-qualified form <T as Element>::KIND must update
to <T as KernelDtype>::KIND (or just plain T::KIND which works
regardless of which trait bound is in scope).
Required Associated Types§
Sourcetype Scalar: ScalarType
type Scalar: ScalarType
Scalar type used for the kernel’s alpha / beta parameters (and
the epilogue compute type). f32 for f16/bf16/f32/F32Strict
— the epilogue runs at f32 to match the F32 accumulator. f64
for f64 — the DGEMM path uses an F64 accumulator and
f64 alpha/beta. For integer / Bool elements the projection
is nominally f32 (no α/β-scaled epilogue applies).
Dyn Compatibility§
This trait is not dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".
Implementations on Foreign Types§
Source§impl Element for f32
f32 GEMM routes through TF32 tensor cores — see
crate::PrecisionGuarantee::math_precision (returns
MathPrecision::Tf32). Inputs are full F32; the math instruction
reduces to TF32 (10-bit mantissa) and accumulates into F32. Use
F32Strict instead when bit-stable, full-precision IEEE 754
binary32 math is required.
impl Element for f32
f32 GEMM routes through TF32 tensor cores — see
crate::PrecisionGuarantee::math_precision (returns
MathPrecision::Tf32). Inputs are full F32; the math instruction
reduces to TF32 (10-bit mantissa) and accumulates into F32. Use
F32Strict instead when bit-stable, full-precision IEEE 754
binary32 math is required.
Source§impl Element for f64
f64 GEMM via Ampere FP64 tensor cores (DGEMM). Full IEEE 754
binary64 inputs, accumulator, and scalars. Analogous to cuBLAS’s
CUBLAS_COMPUTE_64F.
impl Element for f64
f64 GEMM via Ampere FP64 tensor cores (DGEMM). Full IEEE 754
binary64 inputs, accumulator, and scalars. Analogous to cuBLAS’s
CUBLAS_COMPUTE_64F.
Source§impl Element for i32
i32 as an elementwise kernel input element. Used by the integer
arithmetic kernels (bitwise and / or / xor / shift, integer
comparison, integer scans). Distinct from ElementKind::I32’s
historical use as an accumulator-only marker for integer GEMMs —
here i32 is a first-class kernel input type with an Element
impl, so the same BinaryPlan<T, N> / UnaryPlan<T, N> shapes
extend to integer arithmetic.
impl Element for i32
i32 as an elementwise kernel input element. Used by the integer
arithmetic kernels (bitwise and / or / xor / shift, integer
comparison, integer scans). Distinct from ElementKind::I32’s
historical use as an accumulator-only marker for integer GEMMs —
here i32 is a first-class kernel input type with an Element
impl, so the same BinaryPlan<T, N> / UnaryPlan<T, N> shapes
extend to integer arithmetic.
The Scalar projection is f32 (nominal — integer kernels don’t
use α/β-scaled epilogues today).
Source§impl Element for i64
i64 as an elementwise kernel input element. Sibling of the i32
impl above for 64-bit integer arithmetic (PyTorch’s default integer
tensor dtype). Same kernel families, twice the storage width.
impl Element for i64
i64 as an elementwise kernel input element. Sibling of the i32
impl above for 64-bit integer arithmetic (PyTorch’s default integer
tensor dtype). Same kernel families, twice the storage width.
Implementors§
Source§impl Element for Bool
Boolean as an elementwise kernel input element. Used by the logical
op family (logical_and / logical_or / logical_xor) and as the
output type of comparison ops. Storage is 1 byte per element via the
Bool wrapper.
impl Element for Bool
Boolean as an elementwise kernel input element. Used by the logical
op family (logical_and / logical_or / logical_xor) and as the
output type of comparison ops. Storage is 1 byte per element via the
Bool wrapper.
The Scalar projection is f32 (nominal).
Source§impl Element for Complex32
Single-precision complex (interleaved real/imag pair of f32) as an
elementwise kernel input element. Used by the FFT family (fft,
ifft, rfft output / irfft input, etc.) for spectrum-domain
tensors. The Scalar projection is f32 (matches the real width).
impl Element for Complex32
Single-precision complex (interleaved real/imag pair of f32) as an
elementwise kernel input element. Used by the FFT family (fft,
ifft, rfft output / irfft input, etc.) for spectrum-domain
tensors. The Scalar projection is f32 (matches the real width).