Expand description
MoE expert residency pool (TIDE-style predictive offload).
Mirrors the policy in ims-kdks/TIDE
LLaDA2MoeSparseMoeBlock: rank experts by token hits, refresh placement
every τ steps, paired promote/demote to limit PCIe churn.
Router logits and expert indices are unchanged — placement only.
Structs§
- Expert
Pool - Tracks which logical experts are GPU-resident and applies TIDE placement updates.
- Expert
Pool Config - Configuration for
ExpertPool. - Expert
Pool Stats - Cumulative counters (TIDE
offload_stats). - Expert
Refresh Result - Result of one placement refresh.
Enums§
- Expert
Refresh Policy - When to re-run hit counting and expert placement.
- MoEExec
Mode - Per-forward hint from the runner (maps to TIDE
refresh_experts).
Functions§
- gpu_
expert_ budget_ from_ vram - merged_
resident_ mask - Union of GPU-resident experts across per-layer pools (legacy single graph mask).
- per_
layer_ resident_ masks - Per-layer resident bitmasks (TIDE placement; one row per MoE FFN in forward order).