pub struct ReductionConfig {
pub num_slots: usize,
pub use_cooperative: bool,
pub use_software_barrier: bool,
pub shared_mem_bytes: usize,
}Expand description
Configuration for reduction operations.
Fields§
§num_slots: usizeNumber of reduction slots (for parallel accumulation).
Multiple slots reduce atomic contention by spreading updates across several memory locations. The final result is computed by combining all slots on the host.
use_cooperative: boolUse cooperative groups for grid-wide synchronization.
Requires compute capability 6.0+ (Pascal or newer). When disabled, falls back to software barriers or multi-launch.
use_software_barrier: boolUse software barrier when cooperative groups unavailable.
Software barriers use atomic counters in global memory. This works on all devices but has higher latency.
Shared memory size per block for reduction (bytes).
Should be at least block_size * sizeof(T) for full reduction.
Default: 0 (auto-calculate based on block size).
Implementations§
Source§impl ReductionConfig
impl ReductionConfig
Sourcepub fn with_slots(self, num_slots: usize) -> Self
pub fn with_slots(self, num_slots: usize) -> Self
Set the number of accumulation slots.
Sourcepub fn with_cooperative(self, enabled: bool) -> Self
pub fn with_cooperative(self, enabled: bool) -> Self
Enable or disable cooperative groups.
Sourcepub fn with_software_barrier(self, enabled: bool) -> Self
pub fn with_software_barrier(self, enabled: bool) -> Self
Enable or disable software barrier fallback.
Set explicit shared memory size.
Trait Implementations§
Source§impl Clone for ReductionConfig
impl Clone for ReductionConfig
Source§fn clone(&self) -> ReductionConfig
fn clone(&self) -> ReductionConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more