Skip to main content

Module cluster

Module cluster 

Source
Expand description

Thread block cluster configuration for Hopper+ GPUs (SM 9.0+).

Thread block clusters are a new level of the CUDA execution hierarchy introduced with the NVIDIA Hopper architecture (compute capability 9.0). A cluster groups multiple thread blocks that can cooperate more efficiently via distributed shared memory and hardware-accelerated synchronisation.

§Requirements

  • NVIDIA Hopper (H100) or later GPU (compute capability 9.0+).
  • CUDA driver version 12.0 or later.
  • The kernel must be compiled with cluster support.

§Example

let cluster_params = ClusterLaunchParams {
    grid: Dim3::x(16),
    block: Dim3::x(256),
    cluster: ClusterDim::new(2, 1, 1),
    shared_mem_bytes: 0,
};
assert_eq!(cluster_params.blocks_per_cluster(), 2);

Structs§

ClusterDim
Cluster dimensions specifying how many thread blocks form one cluster.
ClusterLaunchParams
Launch parameters including thread block cluster configuration.

Functions§

cluster_launch
Launches a kernel with thread block cluster configuration.