Expand description
Thread block cluster configuration for Hopper+ GPUs (SM 9.0+).
Thread block clusters are a new level of the CUDA execution hierarchy introduced with the NVIDIA Hopper architecture (compute capability 9.0). A cluster groups multiple thread blocks that can cooperate more efficiently via distributed shared memory and hardware-accelerated synchronisation.
§Requirements
- NVIDIA Hopper (H100) or later GPU (compute capability 9.0+).
- CUDA driver version 12.0 or later.
- The kernel must be compiled with cluster support.
§Example
let cluster_params = ClusterLaunchParams {
grid: Dim3::x(16),
block: Dim3::x(256),
cluster: ClusterDim::new(2, 1, 1),
shared_mem_bytes: 0,
};
assert_eq!(cluster_params.blocks_per_cluster(), 2);Structs§
- Cluster
Dim - Cluster dimensions specifying how many thread blocks form one cluster.
- Cluster
Launch Params - Launch parameters including thread block cluster configuration.
Functions§
- cluster_
launch - Launches a kernel with thread block cluster configuration.