#[repr(transparent)]pub struct S4(pub u8);Expand description
Signed 4-bit integer element marker — packed-pair storage.
#[repr(transparent)] around u8. One S4 storage slot is one
byte and holds two packed s4 elements: the low nibble is the
element at even logical index, the high nibble is the element at
odd logical index (along the K axis for A/B operands, along the
N axis for D output). Sign-extended to s32 on the GPU side via
((s8)(nibble << 4)) >> 4.
A DeviceBuffer<u8> of (M*K)/2 bytes can be reinterpreted as a
DeviceBuffer<S4> of (M*K)/2 storage slots via view_as without
copying — S4 is byte-storage at the buffer layer, and element
count lives at the plan-layer descriptor (M / N / K).
Numerical range per element: [-8, +7]. The plan layer
(Int4GemmPlan in baracuda-kernels) takes M, N, K in
element counts and leading dimensions in storage-slot
(= byte) counts — MatrixRef<S4>::ld therefore equals K / 2 for
row-major A with no padding. K must be even (packing is byte-
aligned). Routes through Ada Lovelace int4 tensor cores
(mma.sync.aligned.m16n8k64.row.col.satfinite.s32.s4.s4.s32) with
S32 accumulation and float alpha / beta scaling. First landed in
baracuda-kernels Phase 2.
Tuple Fields§
§0: u8Implementations§
Trait Implementations§
impl Copy for S4
impl DeviceRepr for S4
impl Eq for S4
impl IntElement for S4
Source§impl KernelDtype for S4
impl KernelDtype for S4
Source§const KIND: ElementKind = ElementKind::S4
const KIND: ElementKind = ElementKind::S4
crate::KernelSku::element.