pub enum MathPrecision {
F16,
Bf16,
Tf32,
F32,
F64,
Int8,
Fp8E4M3,
Fp8E5M2,
Int4,
Binary,
}Expand description
Math precision used by the FMA / tensor-core instruction.
Distinct from the input element type because tensor cores can take inputs at one precision and reduce through an instruction at a different precision (most notably TF32: F32 inputs, 10-bit-mantissa math).
Variants§
F16
IEEE 754 binary16 multiply-add.
Bf16
Brain-float 16 multiply-add.
Tf32
TensorFloat-32 (10-bit mantissa) multiply-add. Inputs are stored as F32 but reduced through TF32 tensor cores.
F32
IEEE 754 binary32 multiply-add (CUDA cores, no tensor cores).
F64
IEEE 754 binary64 multiply-add via Ampere FP64 tensor cores (DGEMM).
Int8
8-bit integer multiply-add (mma.sync m16n8k32 integer variant)
with int32 accumulation. Used by both signed (s8) and unsigned
(u8) integer GEMM SKUs; the multiply operands are 8-bit, the
accumulator is 32-bit, and the multiply-add uses the
OpMultiplyAddSaturate operator (clamps the accumulator on
overflow rather than wrapping).
Fp8E4M3
FP8 E4M3 multiply-add (mma.sync m16n8k32 FP8 variant) with F32
accumulation. Inputs are E4M3 (8-bit), the accumulator is F32,
and the epilogue cast saturates to the E4M3 max-finite (±448).
Fp8E5M2
FP8 E5M2 multiply-add. Same instruction family as
Fp8E4M3 but with the E5M2 encoding (wider
exponent, narrower mantissa).
Int4
4-bit integer multiply-add (mma.sync m16n8k64 int4 variant)
with int32 accumulation. Used by both signed (s4) and unsigned
(u4) integer GEMM SKUs; the multiply operands are 4-bit
(packed-pair storage in memory), the accumulator is 32-bit, and
the multiply-add uses the satfinite operator (clamps the
accumulator on overflow rather than wrapping). sm_89+.
Binary
1-bit binary xor.popc multiply-add
(mma.sync m16n8k256 b1 variant) with int32 accumulation. The
“multiply” is per-bit XOR and the “add” is popcount. Used by
the binary GEMM SKU; operands are 1-bit (packed 8-per-byte in
memory), the accumulator is 32-bit, and the output is the
raw popcount accumulator — no re-quantization back to b1.
sm_80+.
Trait Implementations§
Source§impl Clone for MathPrecision
impl Clone for MathPrecision
Source§fn clone(&self) -> MathPrecision
fn clone(&self) -> MathPrecision
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for MathPrecision
Source§impl Debug for MathPrecision
impl Debug for MathPrecision
impl Eq for MathPrecision
Source§impl Hash for MathPrecision
impl Hash for MathPrecision
Source§impl PartialEq for MathPrecision
impl PartialEq for MathPrecision
Source§fn eq(&self, other: &MathPrecision) -> bool
fn eq(&self, other: &MathPrecision) -> bool
self and other values to be equal, and is used by ==.