Expand description
GPU-accelerated MoE gating: parallel top-K expert selection with softmax routing.
One threadgroup per token (grid = seq_len × 1 × 1), 128 threads per group. Supports bf16 hidden state input, f32 router weights, and per-expert scale.
Designed for Gemma 4: 128 experts, top-8 routing, hidden_dim=2816.
Structs§
- MoeGate
Params - Parameters for MoE gate routing.
Functions§
- moe_
gate - Encode a parallel MoE gate operation.