//! Tier 2.5 reduction primitives - `count`/`min`/`max`/`sum` over
//! bitsets and fixed-width u32 ValueSets.
//!
//! Scalar reductions use one grid-stride workgroup and global atomics
//! so the baseline primitive is parallel instead of serial lane-0
//! scaffolding. Higher-level workgroup-tree reductions still compose
//! these where a caller needs per-workgroup partials or f32 support.
/// `reduce_all` - emit `1` when every lane in a u32 ValueSet is non-zero.
/// `reduce_any` - emit `1` when any lane in a u32 ValueSet is non-zero.
/// Unsigned maximum over a u32 ValueSet.
/// Unsigned minimum over a u32 ValueSet.