ferrum_testkit/op_diff/marlin_matmul.rs
1//! `marlin_matmul` (GPTQ INT4) op-diff harness — **PARTIAL: planning stub**.
2//!
3//! The full op needs:
4//! - A: fp16 input `[m, k]`
5//! - B: packed INT4 weight in Marlin tile layout `[k / pack_factor, n]`
6//! - scales: fp16 `[k / group_size, n]`
7//! - zeros: int32 optional, `[k / group_size, n / pack_factor]`
8//! - g_idx: int32 optional, `[k]` for desc_act
9//!
10//! Setup needs a Marlin packer that converts a reference fp32 weight
11//! matrix into the specific tile layout (`pack_factor=8`, `tile_size=16`,
12//! interleaved nibbles). The packer lives in `ferrum-quantization` /
13//! `ferrum-kernels/quantization/gptq_marlin/` but isn't exposed as a
14//! testkit-callable helper.
15//!
16//! Reference impl: CPU backend's `gemm_quant` for `QuantKind::Gptq`
17//! dequantizes the packed B back to fp32 then runs a regular sgemm.
18//! That's what we'd compare CUDA's hand-tuned Marlin kernel against.
19//!
20//! Punted to follow-up: needs `marlin_pack_fixture(fp32 weight) ->
21//! QuantWeights<B>` helper that all backends agree on. Without it
22//! the test would be testing the PACKER not the matmul.
23
24#![allow(dead_code)]
25
26pub struct MarlinMatmulOp {
27 pub m: usize,
28 pub n: usize,
29 pub k: usize,
30 pub group_size: usize,
31}
32
33// impl OpUnderTest for MarlinMatmulOp — pending marlin_pack_fixture helper.