Enum ElementKind

Source

pub enum ElementKind {
Show 17 variants    F16,
    Bf16,
    F32,
    F32Strict,
    F64,
    S8,
    U8,
    I32,
    I64,
    Bool,
    Fp8E4M3,
    Fp8E5M2,
    S4,
    U4,
    Bin,
    Complex32,
    Complex64,
}

Expand description

Runtime tag for an Element or IntElement.

Unified across the float and integer kernel families so that a single kernel-SKU descriptor can describe any baracuda kernel.

Variants§

§

F16

IEEE 754 binary16.

§

Bf16

Brain-float 16.

§

F32

IEEE 754 binary32 inputs reduced through TF32 tensor cores (10-bit mantissa). Maps to the f32 Rust type.

§

F32Strict

IEEE 754 binary32 inputs reduced through SIMT CUDA cores at full f32 precision. Maps to the F32Strict wrapper type. Bit-stable on the same hardware.

§

F64

IEEE 754 binary64. Maps to the f64 Rust type.

§

S8

Signed 8-bit integer. Maps to the S8 wrapper type. Routed through Ampere int8 tensor cores (mma.sync m16n8k32 integer variant) with int32 accumulation; float alpha / beta let the kernel act as a dequantize-in-epilogue.

§

U8

Unsigned 8-bit integer. Maps to the U8 wrapper type. Same kernel family as S8 with unsigned operands.

§

I32

Signed 32-bit integer. Maps to the i32 Rust type via the Element impl. Two roles:

Accumulator marker for integer GEMM SKUs (reported by crate::PrecisionGuarantee::accumulator).
Input element for elementwise integer arithmetic (bitwise / comparison / scan ops). The same plan shapes used for floating-point inputs extend to i32 via the Element impl.

§

I64

Signed 64-bit integer. Maps to the i64 Rust type via the Element impl. Used as an input element for the elementwise integer arithmetic family (bitwise / comparison / scan ops). PyTorch’s default integer tensor dtype.

§

Bool

Boolean (1-byte storage). Maps to the Bool wrapper type via the Element impl. Used as the input element for the logical- op family (logical_and / logical_or / logical_xor) and as the output element for the comparison-op family (eq / ne / gt / ge / lt / le). Truthiness convention follows PyTorch: 0 = false, any non-zero byte = true.

§

Fp8E4M3

8-bit floating-point, E4M3 encoding (1 sign + 4 exponent + 3 mantissa, bias 7, max-finite 448, no infinities). Maps to the Fp8E4M3 wrapper type. Routed through Ada / Hopper FP8 tensor cores (mma.sync m16n8k32 FP8 variant) with F32 accumulation.

§

Fp8E5M2

8-bit floating-point, E5M2 encoding (1 sign + 5 exponent + 2 mantissa, bias 15, IEEE-754-compatible inf / NaN). Maps to the Fp8E5M2 wrapper type. Same FP8 tensor-core path as Fp8E4M3 with the alternate operand tag (.e5m2.e5m2.f32).

§

S4

Signed 4-bit integer — packed-pair storage. Maps to the S4 wrapper type. Routed through Ada Lovelace int4 tensor cores (mma.sync.aligned.m16n8k64.row.col.satfinite.s32.s4.s4.s32) with int32 accumulation; float alpha / beta let the kernel act as a dequantize-in-epilogue (same convention as the int8 family).

§

U4

Unsigned 4-bit integer — packed-pair storage. Maps to the U4 wrapper type. Same kernel family as S4 with the alternate operand tag (.u4.u4.s32).

§

Bin

1-bit binary — packed-byte storage (8 bits per byte, LSB = lowest K index). Maps to the Bin wrapper type. Routed through Ampere+ binary tensor cores (mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.xor.popc). Distinct programming model: the output is the raw popcount accumulator (s32), not a re-quantized b1.

§

Complex32

Single-precision complex — interleaved real/imag pair of f32 (#[repr(C)]). Maps to the Complex32 wrapper type. Used by the FFT op family (Milestone 6.4) for spectrum-domain tensors. ABI-compatible with cuFFT’s cufftComplex, NumPy’s complex64, and PyTorch’s torch.complex64.

§

Complex64

Double-precision complex — interleaved real/imag pair of f64. Maps to the Complex64 wrapper type. ABI-compatible with cuFFT’s cufftDoubleComplex, NumPy’s complex128, and PyTorch’s torch.complex128.

ElementKind

Enum ElementKind Copy item path

Variants§

F16

Bf16

F32

F32Strict

F64

S8

U8

I32

I64

Bool

Fp8E4M3

Fp8E5M2

S4

U4

Bin

Complex32

Complex64

Trait Implementations§

impl Clone for ElementKind

fn clone(&self) -> ElementKind

fn clone_from(&mut self, source: &Self)

impl Copy for ElementKind

impl Debug for ElementKind

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Eq for ElementKind

impl Hash for ElementKind

fn hash<__H>(&self, state: &mut __H)where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl PartialEq for ElementKind

fn eq(&self, other: &ElementKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for ElementKind

Auto Trait Implementations§

impl Freeze for ElementKind

impl RefUnwindSafe for ElementKind

impl Send for ElementKind

impl Sync for ElementKind

impl Unpin for ElementKind

impl UnsafeUnpin for ElementKind

impl UnwindSafe for ElementKind

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Enum ElementKind

fn hash<H>(&self, state: &mut H)
where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,