//! GEMM (General Matrix Multiply) Kernels
//!
//! Implements C = alpha * A @ B + beta * C with multiple variants:
//!
//! - **Basic**: Standard 2D GEMM (naive, tiled, tensor core)
//! - **Batched**: 3D batched GEMM for independent matrix multiplications
//! - **Batched4D**: 4D batched GEMM for multi-head attention
pub use ;
pub use ;
pub use ;