//! Fused Quantized GEMV Kernels
//!
//! These kernels fuse multiple operations to reduce memory bandwidth:
//! - FusedRmsNormQ4KGemvKernel: RMSNorm + Q4K GEMV in single pass
//! - FusedGateUpQ4KGemvKernel: Gate + Up projections sharing input load
//! - FusedRmsNormGateUpSwigluQ4KKernel: RMSNorm + Gate+Up + SwiGLU (3-way fusion)
//! - FusedRmsNormNf4GemvKernel: RMSNorm + NF4 GEMV for training (PMAT-475)
//! - FusedNf4GateUpGemmKernel: Gate + Up NF4 GEMM sharing input (PMAT-475)
pub use FusedGateUpQ4KGemvKernel;
pub use FusedNf4GateUpGemmKernel;
pub use FusedRmsNormNf4GemvKernel;
pub use FusedRmsNormGateUpSwigluQ4KKernel;
pub use FusedRmsNormQ4KGemvKernel;