Skip to main content

MathPrecision

Enum MathPrecision 

Source
pub enum MathPrecision {
    F16,
    Bf16,
    Tf32,
    F32,
    F64,
    Int8,
    Fp8E4M3,
    Fp8E5M2,
    Int4,
    Binary,
}
Expand description

Math precision used by the FMA / tensor-core instruction.

Distinct from the input element type because tensor cores can take inputs at one precision and reduce through an instruction at a different precision (most notably TF32: F32 inputs, 10-bit-mantissa math).

Variants§

§

F16

IEEE 754 binary16 multiply-add.

§

Bf16

Brain-float 16 multiply-add.

§

Tf32

TensorFloat-32 (10-bit mantissa) multiply-add. Inputs are stored as F32 but reduced through TF32 tensor cores.

§

F32

IEEE 754 binary32 multiply-add (CUDA cores, no tensor cores).

§

F64

IEEE 754 binary64 multiply-add via Ampere FP64 tensor cores (DGEMM).

§

Int8

8-bit integer multiply-add (mma.sync m16n8k32 integer variant) with int32 accumulation. Used by both signed (s8) and unsigned (u8) integer GEMM SKUs; the multiply operands are 8-bit, the accumulator is 32-bit, and the multiply-add uses the OpMultiplyAddSaturate operator (clamps the accumulator on overflow rather than wrapping).

§

Fp8E4M3

FP8 E4M3 multiply-add (mma.sync m16n8k32 FP8 variant) with F32 accumulation. Inputs are E4M3 (8-bit), the accumulator is F32, and the epilogue cast saturates to the E4M3 max-finite (±448).

§

Fp8E5M2

FP8 E5M2 multiply-add. Same instruction family as Fp8E4M3 but with the E5M2 encoding (wider exponent, narrower mantissa).

§

Int4

4-bit integer multiply-add (mma.sync m16n8k64 int4 variant) with int32 accumulation. Used by both signed (s4) and unsigned (u4) integer GEMM SKUs; the multiply operands are 4-bit (packed-pair storage in memory), the accumulator is 32-bit, and the multiply-add uses the satfinite operator (clamps the accumulator on overflow rather than wrapping). sm_89+.

§

Binary

1-bit binary xor.popc multiply-add (mma.sync m16n8k256 b1 variant) with int32 accumulation. The “multiply” is per-bit XOR and the “add” is popcount. Used by the binary GEMM SKU; operands are 1-bit (packed 8-per-byte in memory), the accumulator is 32-bit, and the output is the raw popcount accumulator — no re-quantization back to b1. sm_80+.

Trait Implementations§

Source§

impl Clone for MathPrecision

Source§

fn clone(&self) -> MathPrecision

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for MathPrecision

Source§

impl Debug for MathPrecision

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Eq for MathPrecision

Source§

impl Hash for MathPrecision

Source§

fn hash<__H>(&self, state: &mut __H)
where __H: Hasher,

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for MathPrecision

Source§

fn eq(&self, other: &MathPrecision) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for MathPrecision

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.