pub struct WarpState { /* private fields */ }Expand description
Per-warp shared state used to emulate warp-level operations.
In a real GPU each warp executes in lock-step and has hardware support for
cross-lane communication. On the CPU we emulate this by having all threads
in a “warp” share a WarpState and synchronise explicitly via barriers.
Implementations§
Source§impl WarpState
impl WarpState
Sourcepub fn set_lane_active(&self, lane_id: u32)
pub fn set_lane_active(&self, lane_id: u32)
Set a lane as active.
Sourcepub fn set_lane_inactive(&self, lane_id: u32)
pub fn set_lane_inactive(&self, lane_id: u32)
Set a lane as inactive.
Sourcepub fn active_mask(&self) -> u32
pub fn active_mask(&self) -> u32
Get the current active mask.
Sourcepub fn is_lane_active(&self, lane_id: u32) -> bool
pub fn is_lane_active(&self, lane_id: u32) -> bool
Returns true if the specified lane is currently active.
Sourcepub fn shuffle(&self, lane_id: u32, value: u32, src_lane: u32) -> u32
pub fn shuffle(&self, lane_id: u32, value: u32, src_lane: u32) -> u32
Emulate __shfl_sync: read the value from src_lane.
The caller (at lane_id) first writes its own value, then after a
barrier reads from src_lane. In a single-threaded emulation context,
the caller can pre-populate all lanes and then read.
§Arguments
lane_id- The calling thread’s lane within the warp (0..31)value- The value this lane contributessrc_lane- The lane to read from
Returns the value from src_lane, or this lane’s own value if
src_lane is out of range.
Sourcepub fn shuffle_xor(&self, lane_id: u32, value: u32, lane_mask: u32) -> u32
pub fn shuffle_xor(&self, lane_id: u32, value: u32, lane_mask: u32) -> u32
Emulate __shfl_xor_sync: read from lane_id ^ lane_mask.
Sourcepub fn shuffle_up(&self, lane_id: u32, value: u32, delta: u32) -> u32
pub fn shuffle_up(&self, lane_id: u32, value: u32, delta: u32) -> u32
Emulate __shfl_up_sync: read from lane_id - delta.
If the source lane would be negative, return the caller’s own value.
Sourcepub fn shuffle_down(&self, lane_id: u32, value: u32, delta: u32) -> u32
pub fn shuffle_down(&self, lane_id: u32, value: u32, delta: u32) -> u32
Emulate __shfl_down_sync: read from lane_id + delta.
If the source lane would be >= WARP_SIZE, return the caller’s own value.
Sourcepub fn shuffle_f32(&self, lane_id: u32, value: f32, src_lane: u32) -> f32
pub fn shuffle_f32(&self, lane_id: u32, value: f32, src_lane: u32) -> f32
Shuffle an f32 value (reinterpret bits through u32).
Sourcepub fn shuffle_xor_f32(&self, lane_id: u32, value: f32, lane_mask: u32) -> f32
pub fn shuffle_xor_f32(&self, lane_id: u32, value: f32, lane_mask: u32) -> f32
Shuffle XOR with f32.
Sourcepub fn shuffle_down_f32(&self, lane_id: u32, value: f32, delta: u32) -> f32
pub fn shuffle_down_f32(&self, lane_id: u32, value: f32, delta: u32) -> f32
Shuffle down with f32.
Sourcepub fn vote_all(&self, lane_id: u32, predicate: bool) -> bool
pub fn vote_all(&self, lane_id: u32, predicate: bool) -> bool
Emulate __all_sync: returns true if all active lanes have predicate == true.
Sourcepub fn vote_any(&self, lane_id: u32, predicate: bool) -> bool
pub fn vote_any(&self, lane_id: u32, predicate: bool) -> bool
Emulate __any_sync: returns true if any active lane has predicate == true.
Sourcepub fn ballot(&self, lane_id: u32, predicate: bool) -> u32
pub fn ballot(&self, lane_id: u32, predicate: bool) -> u32
Emulate __ballot_sync: returns a bitmask where bit i is set if
lane i is active and its predicate is true.
Sourcepub fn reduce_sum_f32(&self, lane_id: u32, value: f32) -> f32
pub fn reduce_sum_f32(&self, lane_id: u32, value: f32) -> f32
Warp-level sum reduction using shuffle_down (butterfly pattern).
Assumes all 32 lanes call this with their value. Returns the sum at lane 0; other lanes get a partial result.
Sourcepub fn reduce_max_f32(&self, lane_id: u32, value: f32) -> f32
pub fn reduce_max_f32(&self, lane_id: u32, value: f32) -> f32
Warp-level max reduction.
Sourcepub fn reduce_min_f32(&self, lane_id: u32, value: f32) -> f32
pub fn reduce_min_f32(&self, lane_id: u32, value: f32) -> f32
Warp-level min reduction.
Sourcepub fn popc_ballot(&self, lane_id: u32, predicate: bool) -> u32
pub fn popc_ballot(&self, lane_id: u32, predicate: bool) -> u32
Count the number of active lanes with a true predicate (popcount of ballot).