pub struct TailMaskPolicy {
pub original_count: u32,
pub rounded_count: u32,
pub tail_lanes: u32,
}Expand description
Result of coercing a logical element count up to the next power of two.
Backends that opt into the N6 substrate dispatch over rounded_count
lanes (so every workgroup is uniform-shape, no boundary divergence on
the last workgroup) and have the kernel guard each store with the
tail-mask predicate lane_id < original_count. Threads beyond the
original count no-op their stores.
The win is on tail handling for attention/softmax/reduce shapes where the workload is not a multiple of the workgroup size - without coercion the last workgroup runs with masked-out lanes that still incur scheduling cost; with coercion every workgroup is identical and the masked-out lanes are skipped via the predicate.
Fields§
§original_count: u32Logical element count requested by the caller.
rounded_count: u32Element count after rounding up to the next power of two. Equal
to original_count when it is already a power of two.
tail_lanes: u32Convenience: rounded_count - original_count. Lanes in this
suffix range must be predicated off by the kernel.
Implementations§
Source§impl TailMaskPolicy
impl TailMaskPolicy
Sourcepub fn is_aligned(&self) -> bool
pub fn is_aligned(&self) -> bool
True when no rounding was needed; the dispatch can run as-is without a tail-mask predicate.
Trait Implementations§
Source§impl Clone for TailMaskPolicy
impl Clone for TailMaskPolicy
Source§fn clone(&self) -> TailMaskPolicy
fn clone(&self) -> TailMaskPolicy
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for TailMaskPolicy
Source§impl Debug for TailMaskPolicy
impl Debug for TailMaskPolicy
impl Eq for TailMaskPolicy
Source§impl PartialEq for TailMaskPolicy
impl PartialEq for TailMaskPolicy
Source§fn eq(&self, other: &TailMaskPolicy) -> bool
fn eq(&self, other: &TailMaskPolicy) -> bool
self and other values to be equal, and is used by ==.impl StructuralPartialEq for TailMaskPolicy
Auto Trait Implementations§
impl Freeze for TailMaskPolicy
impl RefUnwindSafe for TailMaskPolicy
impl Send for TailMaskPolicy
impl Sync for TailMaskPolicy
impl Unpin for TailMaskPolicy
impl UnsafeUnpin for TailMaskPolicy
impl UnwindSafe for TailMaskPolicy
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.