Expand description
gemm op-diff harness — covers the basic fp16 matmul that backs
qkv_proj, o_proj, gate_up_proj, down_proj, and the lm_head
projection. Per nsys profile on Vast 4090 / M3, Marlin<256,...>
Marlin matmul accounts for ~55% of GPU time at c=16; this op-diff
validates the non-quantized fallback path against CPU.
Structs§
- GemmOp
C[m, n] = A[m, k] · B[n, k]^T(row-major, B already transposed to head-major). Matches the Backend::gemm signature used by Linear.