Skip to main content

Module expert_pool

Module expert_pool

Expand description

MoE expert residency pool (TIDE-style predictive offload).

Mirrors the policy in ims-kdks/TIDE LLaDA2MoeSparseMoeBlock: rank experts by token hits, refresh placement every τ steps, paired promote/demote to limit PCIe churn.

Router logits and expert indices are unchanged — placement only.

Structs§

ExpertPool: Tracks which logical experts are GPU-resident and applies TIDE placement updates.
ExpertPoolConfig: Configuration for ExpertPool.
ExpertPoolStats: Cumulative counters (TIDE offload_stats).
ExpertRefreshResult: Result of one placement refresh.

Enums§

ExpertRefreshPolicy: When to re-run hit counting and expert placement.
MoEExecMode: Per-forward hint from the runner (maps to TIDE refresh_experts).

Functions§

gpu_expert_budget_from_vram
merged_resident_mask: Union of GPU-resident experts across per-layer pools (legacy single graph mask).
per_layer_resident_masks: Per-layer resident bitmasks (TIDE placement; one row per MoE FFN in forward order).