//! Tier 2.5 reduction primitives — `count`/`min`/`max`/`sum` over
//! bitsets and fixed-width u32 ValueSets.
//!
//! Scalar reductions use one grid-stride workgroup and global atomics
//! so the baseline primitive is parallel instead of serial lane-0
//! scaffolding. Higher-level workgroup-tree reductions still compose
//! these where a caller needs per-workgroup partials or f32 support.