Skip to main content

Module gemm

Module gemm 

Source
Expand description

gemm op-diff harness — covers the basic fp16 matmul that backs qkv_proj, o_proj, gate_up_proj, down_proj, and the lm_head projection. Per nsys profile on Vast 4090 / M3, Marlin<256,...> Marlin matmul accounts for ~55% of GPU time at c=16; this op-diff validates the non-quantized fallback path against CPU.

Structs§

GemmOp
C[m, n] = A[m, k] · B[n, k]^T (row-major, B already transposed to head-major). Matches the Backend::gemm signature used by Linear.