1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
//! GEMM family — unified plan-based API.
//!
//! Today this module hosts the integer GEMM dispatcher [`IntGemmPlan`],
//! which routes:
//!
//! - [`LayoutSku::Rcr`] → `baracuda-cutlass`'s CUTLASS-based int8
//! kernels (`IntGemmPlan<T, BT>` over RCR with the five-epilogue
//! bias family).
//! - [`LayoutSku::Rrr`] → bespoke `mma.sync.m16n8k32` kernels in
//! `baracuda-kernels-sys`. RRR coverage starts with `S8 × Identity`
//! (this commit) and grows out the rest of the 18-SKU matrix in
//! subsequent commits.
//!
//! Callers see a single `IntGemmPlan` type with one `select` / `run`
//! contract; the per-layout backend is observable via [`IntGemmPlan::sku`]
//! for telemetry but doesn't leak into the call signature.
// Phase 74 — plain dense FP GEMM (cuBLAS-backed; RRR / RCR / CRR +
// strided-batch). The family every other plan in this module assumes
// exists somewhere else — see the module docs for the split vs
// `baracuda_cutlass::GemmPlan`.
// Phase 54 — 2:4 Structured Sparsity GEMM (xFormers algorithmic-
// reference hand-port). Plan file always compiles; FFI calls inside
// `run()` are `#[cfg(feature = "xformers_sparse24")]`-gated so the
// public API surface exists even without the feature.
// Phase 48 — Marlin + AWQ 4-bit GEMM (vendored). Both plan files
// always compile so the public API surface is stable; FFI calls in
// `run()` are `#[cfg(feature = "marlin")]` / `#[cfg(feature = "awq")]`
// gated. `gptq_to_marlin` is a pure-Rust host-side repack utility
// (no GPU dependency) and compiles unconditionally under the
// `marlin` feature.
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
// Phase 48 re-exports.
pub use ;
pub use ;
pub use ;