Skip to main content

Element

Trait Element 

Source
pub trait Element: KernelDtype + Sealed {
    type Scalar: ScalarType;
}
Expand description

Element types supported by the kernel facade.

Sealed to prevent downstream impls — adding a new dtype requires shipping a new kernel instantiation in the corresponding *-kernels-sys crate.

The trait spans three families that share the <T: Element>- parameterized plan shape but route through distinct kernel SKUs:

  • Floating-point: f16, bf16, f32, F32Strict, f64. f32 reduces through TF32 tensor cores (10-bit mantissa); F32Strict uses SIMT CUDA cores at full IEEE 754 binary32 with bit-stable results. The Scalar projection is f32 for the 16-bit / 32-bit float members and f64 for f64.
  • Integer: i32, i64. Used for elementwise integer arithmetic (bitwise ops, integer comparison). The Scalar projection is f32 — these types don’t participate in α/β-scaled epilogues, so the projection is nominal. Note: S8 / U8 / S4 / U4 are GEMM-only operand types and live on the separate IntElement trait — they don’t implement Element.
  • Boolean: Bool (1-byte storage, 0/non-zero truthiness). Used for logical ops and as the output type of comparison ops. The Scalar projection is f32 (also nominal).

Sibling traits IntElement, FpElement, BinElement, and BiasElement cover GEMM-only / FP8 / packed-bit / bias-broadcast types respectively; those have their own kernel families and don’t route through <T: Element>-parameterized elementwise plans. The umbrella KernelDtype supertrait covers the union of Element

  • IntElement + FpElement + BinElement.

§KIND lookup

Element does NOT redeclare const KIND; the const is inherited from the KernelDtype supertrait. This keeps T::KIND unambiguous at every call site under <T: Element> bounds. Pre-Phase-28 code using the fully-qualified form <T as Element>::KIND must update to <T as KernelDtype>::KIND (or just plain T::KIND which works regardless of which trait bound is in scope).

Required Associated Types§

Source

type Scalar: ScalarType

Scalar type used for the kernel’s alpha / beta parameters (and the epilogue compute type). f32 for f16/bf16/f32/F32Strict — the epilogue runs at f32 to match the F32 accumulator. f64 for f64 — the DGEMM path uses an F64 accumulator and f64 alpha/beta. For integer / Bool elements the projection is nominally f32 (no α/β-scaled epilogue applies).

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementations on Foreign Types§

Source§

impl Element for bf16

Source§

impl Element for f16

Source§

impl Element for f32

f32 GEMM routes through TF32 tensor cores — see crate::PrecisionGuarantee::math_precision (returns MathPrecision::Tf32). Inputs are full F32; the math instruction reduces to TF32 (10-bit mantissa) and accumulates into F32. Use F32Strict instead when bit-stable, full-precision IEEE 754 binary32 math is required.

Source§

impl Element for f64

f64 GEMM via Ampere FP64 tensor cores (DGEMM). Full IEEE 754 binary64 inputs, accumulator, and scalars. Analogous to cuBLAS’s CUBLAS_COMPUTE_64F.

Source§

impl Element for i32

i32 as an elementwise kernel input element. Used by the integer arithmetic kernels (bitwise and / or / xor / shift, integer comparison, integer scans). Distinct from ElementKind::I32’s historical use as an accumulator-only marker for integer GEMMs — here i32 is a first-class kernel input type with an Element impl, so the same BinaryPlan<T, N> / UnaryPlan<T, N> shapes extend to integer arithmetic.

The Scalar projection is f32 (nominal — integer kernels don’t use α/β-scaled epilogues today).

Source§

impl Element for i64

i64 as an elementwise kernel input element. Sibling of the i32 impl above for 64-bit integer arithmetic (PyTorch’s default integer tensor dtype). Same kernel families, twice the storage width.

Implementors§

Source§

impl Element for Bool

Boolean as an elementwise kernel input element. Used by the logical op family (logical_and / logical_or / logical_xor) and as the output type of comparison ops. Storage is 1 byte per element via the Bool wrapper.

The Scalar projection is f32 (nominal).

Source§

impl Element for Complex32

Single-precision complex (interleaved real/imag pair of f32) as an elementwise kernel input element. Used by the FFT family (fft, ifft, rfft output / irfft input, etc.) for spectrum-domain tensors. The Scalar projection is f32 (matches the real width).

Source§

impl Element for Complex64

Double-precision complex (interleaved real/imag pair of f64) as an elementwise kernel input element. Sibling to Complex32; the Scalar projection is f64.

Source§

impl Element for F32Strict