pub enum ElementKind {
Show 17 variants
F16,
Bf16,
F32,
F32Strict,
F64,
S8,
U8,
I32,
I64,
Bool,
Fp8E4M3,
Fp8E5M2,
S4,
U4,
Bin,
Complex32,
Complex64,
}Expand description
Runtime tag for an Element or IntElement.
Unified across the float and integer kernel families so that a single kernel-SKU descriptor can describe any baracuda kernel.
Variants§
F16
IEEE 754 binary16.
Bf16
Brain-float 16.
F32
IEEE 754 binary32 inputs reduced through TF32 tensor cores
(10-bit mantissa). Maps to the f32 Rust type.
F32Strict
IEEE 754 binary32 inputs reduced through SIMT CUDA cores at full
f32 precision. Maps to the F32Strict wrapper type. Bit-stable
on the same hardware.
F64
IEEE 754 binary64. Maps to the f64 Rust type.
S8
Signed 8-bit integer. Maps to the S8 wrapper type. Routed
through Ampere int8 tensor cores (mma.sync m16n8k32 integer
variant) with int32 accumulation; float alpha / beta let
the kernel act as a dequantize-in-epilogue.
U8
Unsigned 8-bit integer. Maps to the U8 wrapper type. Same
kernel family as S8 with unsigned operands.
I32
Signed 32-bit integer. Maps to the i32 Rust type via the
Element impl. Two roles:
- Accumulator marker for integer GEMM SKUs (reported by
crate::PrecisionGuarantee::accumulator). - Input element for elementwise integer arithmetic
(bitwise / comparison / scan ops). The same plan shapes used
for floating-point inputs extend to
i32via theElementimpl.
I64
Signed 64-bit integer. Maps to the i64 Rust type via the
Element impl. Used as an input element for the elementwise
integer arithmetic family (bitwise / comparison / scan ops).
PyTorch’s default integer tensor dtype.
Bool
Boolean (1-byte storage). Maps to the Bool wrapper type via
the Element impl. Used as the input element for the logical-
op family (logical_and / logical_or / logical_xor) and as
the output element for the comparison-op family
(eq / ne / gt / ge / lt / le). Truthiness convention
follows PyTorch: 0 = false, any non-zero byte = true.
Fp8E4M3
8-bit floating-point, E4M3 encoding (1 sign + 4 exponent + 3
mantissa, bias 7, max-finite 448, no infinities). Maps to the
Fp8E4M3 wrapper type. Routed through Ada / Hopper FP8 tensor
cores (mma.sync m16n8k32 FP8 variant) with F32 accumulation.
Fp8E5M2
8-bit floating-point, E5M2 encoding (1 sign + 5 exponent + 2
mantissa, bias 15, IEEE-754-compatible inf / NaN). Maps to the
Fp8E5M2 wrapper type. Same FP8 tensor-core path as
Fp8E4M3 with the alternate operand tag
(.e5m2.e5m2.f32).
S4
Signed 4-bit integer — packed-pair storage. Maps to the S4
wrapper type. Routed through Ada Lovelace int4 tensor cores
(mma.sync.aligned.m16n8k64.row.col.satfinite.s32.s4.s4.s32)
with int32 accumulation; float alpha / beta let the kernel
act as a dequantize-in-epilogue (same convention as the int8
family).
U4
Unsigned 4-bit integer — packed-pair storage. Maps to the U4
wrapper type. Same kernel family as S4 with the alternate
operand tag (.u4.u4.s32).
Bin
1-bit binary — packed-byte storage (8 bits per byte, LSB =
lowest K index). Maps to the Bin wrapper type. Routed
through Ampere+ binary tensor cores
(mma.sync.aligned.m16n8k256.row.col.s32.b1.b1.s32.xor.popc).
Distinct programming model: the output is the raw popcount
accumulator (s32), not a re-quantized b1.
Complex32
Single-precision complex — interleaved real/imag pair of f32
(#[repr(C)]). Maps to the Complex32 wrapper type. Used by
the FFT op family (Milestone 6.4) for spectrum-domain tensors.
ABI-compatible with cuFFT’s cufftComplex, NumPy’s complex64,
and PyTorch’s torch.complex64.
Complex64
Double-precision complex — interleaved real/imag pair of f64.
Maps to the Complex64 wrapper type. ABI-compatible with
cuFFT’s cufftDoubleComplex, NumPy’s complex128, and
PyTorch’s torch.complex128.
Trait Implementations§
Source§impl Clone for ElementKind
impl Clone for ElementKind
Source§fn clone(&self) -> ElementKind
fn clone(&self) -> ElementKind
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for ElementKind
Source§impl Debug for ElementKind
impl Debug for ElementKind
impl Eq for ElementKind
Source§impl Hash for ElementKind
impl Hash for ElementKind
Source§impl PartialEq for ElementKind
impl PartialEq for ElementKind
Source§fn eq(&self, other: &ElementKind) -> bool
fn eq(&self, other: &ElementKind) -> bool
self and other values to be equal, and is used by ==.