pub struct IntGemmArgs<'a, T, BT = f32>where
T: IntElement,
BT: BiasElement,{
pub a: MatrixRef<'a, T>,
pub b: MatrixRef<'a, T>,
pub c: Option<MatrixRef<'a, T>>,
pub d: MatrixMut<'a, T>,
pub bias: Option<VectorRef<'a, BT>>,
pub alpha: f32,
pub beta: f32,
}Expand description
Per-launch arguments for an
IntGemmPlan::run call.
Parallel to GemmArgs for the integer GEMM family. The matrix
operands carry the kernel element type T: IntElement
(today: S8 or U8); the optional bias carries the
independent bias element type BT: BiasElement (today: f32 or
i32). Scalar alpha / beta are always f32 regardless of T
or BT — CUTLASS’s LinearCombinationClamp /
LinearCombinationBiasElementwise epilogues do the entire
alpha/beta/bias/activation chain in float (after int32→float
dequant of the accumulator) and saturating-cast back to the int
output range on store.
Fields§
§a: MatrixRef<'a, T>Left input. Row-major [M, K].
b: MatrixRef<'a, T>Right input. Column-major [K, N] (RCR).
c: Option<MatrixRef<'a, T>>Optional accumulation source. Row-major [M, N].
d: MatrixMut<'a, T>Output. Row-major [M, N].
bias: Option<VectorRef<'a, BT>>Optional bias vector. Required when the descriptor’s epilogue
is any Bias* variant; must be None for
EpilogueKind::Identity. Length-N, contiguous (stride 1)
device memory; broadcast across rows of D.
alpha: f32Multiplier on the matrix-multiply accumulator. Always f32
for int GEMM — CUTLASS does the entire epilogue compute in
float space.
beta: f32Multiplier on c. Forced to 0 internally when c is None.