Enum MathPrecision

Source

pub enum MathPrecision {
    F16,
    Bf16,
    Tf32,
    F32,
    F64,
    Int8,
    Fp8E4M3,
    Fp8E5M2,
    Int4,
    Binary,
}

Expand description

Math precision used by the FMA / tensor-core instruction.

Distinct from the input element type because tensor cores can take inputs at one precision and reduce through an instruction at a different precision (most notably TF32: F32 inputs, 10-bit-mantissa math).

Variants§

§

F16

IEEE 754 binary16 multiply-add.

§

Bf16

Brain-float 16 multiply-add.

§

Tf32

TensorFloat-32 (10-bit mantissa) multiply-add. Inputs are stored as F32 but reduced through TF32 tensor cores.

§

F32

IEEE 754 binary32 multiply-add (CUDA cores, no tensor cores).

§

F64

IEEE 754 binary64 multiply-add via Ampere FP64 tensor cores (DGEMM).

§

Int8

8-bit integer multiply-add (mma.sync m16n8k32 integer variant) with int32 accumulation. Used by both signed (s8) and unsigned (u8) integer GEMM SKUs; the multiply operands are 8-bit, the accumulator is 32-bit, and the multiply-add uses the OpMultiplyAddSaturate operator (clamps the accumulator on overflow rather than wrapping).

§

Fp8E4M3

FP8 E4M3 multiply-add (mma.sync m16n8k32 FP8 variant) with F32 accumulation. Inputs are E4M3 (8-bit), the accumulator is F32, and the epilogue cast saturates to the E4M3 max-finite (±448).

§

Fp8E5M2

FP8 E5M2 multiply-add. Same instruction family as Fp8E4M3 but with the E5M2 encoding (wider exponent, narrower mantissa).

§

Int4

4-bit integer multiply-add (mma.sync m16n8k64 int4 variant) with int32 accumulation. Used by both signed (s4) and unsigned (u4) integer GEMM SKUs; the multiply operands are 4-bit (packed-pair storage in memory), the accumulator is 32-bit, and the multiply-add uses the satfinite operator (clamps the accumulator on overflow rather than wrapping). sm_89+.

§

Binary

1-bit binary xor.popc multiply-add (mma.sync m16n8k256 b1 variant) with int32 accumulation. The “multiply” is per-bit XOR and the “add” is popcount. Used by the binary GEMM SKU; operands are 1-bit (packed 8-per-byte in memory), the accumulator is 32-bit, and the output is the raw popcount accumulator — no re-quantization back to b1. sm_80+.

MathPrecision

Enum MathPrecision Copy item path

Variants§

F16

Bf16

Tf32

F32

F64

Int8

Fp8E4M3

Fp8E5M2

Int4

Binary

Trait Implementations§

impl Clone for MathPrecision

fn clone(&self) -> MathPrecision

fn clone_from(&mut self, source: &Self)

impl Copy for MathPrecision

impl Debug for MathPrecision

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Eq for MathPrecision

impl Hash for MathPrecision

fn hash<__H>(&self, state: &mut __H)where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl PartialEq for MathPrecision

fn eq(&self, other: &MathPrecision) -> bool

fn ne(&self, other: &Rhs) -> bool

impl StructuralPartialEq for MathPrecision

Auto Trait Implementations§

impl Freeze for MathPrecision

impl RefUnwindSafe for MathPrecision

impl Send for MathPrecision

impl Sync for MathPrecision

impl Unpin for MathPrecision

impl UnsafeUnpin for MathPrecision

impl UnwindSafe for MathPrecision

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Enum MathPrecision

fn hash<H>(&self, state: &mut H)
where __H: Hasher,

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,