1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
//! Mixture of Experts (MoE) module.
//!
//! Provides GPU-accelerated MoE primitives for transformer models such as
//! Mixtral, Switch Transformer, and GShard. The module implements:
//!
//! - **Routing** ([`routing`]) — top-k expert selection with fused softmax.
//! - **Permutation** ([`permute`]) — token scatter/gather by expert assignment.
//! - **Grouped GEMM** ([`grouped_gemm`]) — MoE-specific batched GEMM wrapper.
//! - **Fused MoE** ([`mod@fused_moe`]) — end-to-end fused kernel combining
//! permute + GEMM + activation + GEMM + unpermute.
//! - **Auxiliary Loss** ([`aux_loss`]) — Switch Transformer style load-balancing
//! and z-loss for training stability.
//! - **Capacity** ([`capacity`]) — expert capacity factor tuning with overflow
//! masking and dynamic capacity adjustment.
//! - **Monitoring** ([`monitoring`]) — runtime expert utilization tracking and
//! imbalance detection.
//!
//! # Architecture
//!
//! The MoE layer routes each input token to its top-k experts, executes
//! per-expert FFN layers (two linear projections with an activation in
//! between), and combines the expert outputs weighted by the routing scores.
pub use ;
pub use ;
pub use ;
pub use moe_grouped_gemm;
pub use ;
pub use ;
pub use ;