Expand description
Dynamic Actor Scheduling — Work Stealing Protocol
Provides load balancing for persistent GPU actors via a work stealing protocol. Without dynamic scheduling, each actor (thread block) processes only its own message queue. If one actor’s workload spikes while neighbors are idle, the busy actor becomes a bottleneck.
§Scheduler Warp Pattern
Within each thread block of the persistent kernel:
- Warp 0: Scheduler warp — monitors queue depth, steals work from overloaded neighbors, redistributes messages
- Warps 1-N: Compute warps — process messages from the local work queue
┌─── Block (Actor) ───────────────────────────────────┐
│ Warp 0 [SCHEDULER] │
│ ├─ Monitor local queue depth │
│ ├─ If depth < steal_threshold: │
│ │ └─ Steal from busiest neighbor via K2K │
│ ├─ If depth > share_threshold: │
│ │ └─ Offer work to least-busy neighbor │
│ └─ Update load metrics in shared memory │
│ │
│ Warps 1-7 [COMPUTE] │
│ ├─ Dequeue message from local work queue │
│ ├─ Process message (user handler) │
│ └─ Enqueue response to output queue │
└──────────────────────────────────────────────────────┘§Work Stealing Protocol
- Each block publishes its queue depth to a shared load table (global or DSMEM)
- Scheduler warp compares local depth with neighbor depths
- If local depth <
steal_thresholdand a neighbor has depth >share_threshold: a. Scheduler warp atomically reserves N messages from neighbor’s queue b. Messages are copied via K2K channel (DSMEM for cluster, global for cross-cluster) c. Both blocks update their queue depths - Grid sync (or cluster sync) ensures load table consistency
§Load Table Layout (in mapped/global memory)
load_table[block_id] = {
queue_depth: u32, // Current input queue depth
capacity: u32, // Queue capacity
messages_processed: u64, // Throughput indicator
steal_requests: u32, // Pending steal requests
offer_count: u32, // Messages offered to steal
}Structs§
- Load
Entry - Per-actor load entry in the shared load table.
- Load
Table - The load table containing entries for all actors.
- Scheduler
Config - Configuration for dynamic actor scheduling.
- Scheduler
Warp Config - Configuration for the scheduler warp pattern in CUDA codegen.
- StealOp
- A single work-stealing operation.
- Work
Item - Work item for the scheduler.
Enums§
- Scheduling
Strategy - Scheduling strategy for persistent actors.