Skip to main content

Module moe_gate

Module moe_gate 

Source
Expand description

GPU-accelerated MoE gating: parallel top-K expert selection with softmax routing.

One threadgroup per token (grid = seq_len × 1 × 1), 128 threads per group. Supports bf16 hidden state input, f32 router weights, and per-expert scale.

Designed for Gemma 4: 128 experts, top-8 routing, hidden_dim=2816.

Structs§

MoeGateParams
Parameters for MoE gate routing.

Functions§

moe_gate
Encode a parallel MoE gate operation.