pub enum SmVersion {
Sm75,
Sm80,
Sm86,
Sm89,
Sm90,
Sm100,
Sm120,
}Expand description
GPU architecture version for occupancy estimation.
This is a local copy that avoids a dependency on oxicuda-ptx.
Each variant encodes the SM architecture parameters needed for
occupancy calculations (max warps per SM, register file size, etc.).
Variants§
Sm75
Turing (compute capability 7.5).
Sm80
Ampere (compute capability 8.0).
Sm86
Ampere GA10x (compute capability 8.6).
Sm89
Ada Lovelace (compute capability 8.9).
Sm90
Hopper (compute capability 9.0).
Sm100
Blackwell (compute capability 10.0).
Sm120
Blackwell B200 (compute capability 12.0).
Implementations§
Source§impl SmVersion
impl SmVersion
Sourcepub const fn max_warps_per_sm(self) -> u32
pub const fn max_warps_per_sm(self) -> u32
Maximum number of warps that can reside on a single SM.
Sourcepub const fn max_blocks_per_sm(self) -> u32
pub const fn max_blocks_per_sm(self) -> u32
Maximum number of thread blocks that can reside on a single SM.
Sourcepub const fn registers_per_sm(self) -> u32
pub const fn registers_per_sm(self) -> u32
Total number of 32-bit registers available per SM.
Sourcepub const fn max_registers_per_thread(self) -> u32
pub const fn max_registers_per_thread(self) -> u32
Maximum number of registers a single thread can use.
Maximum shared memory per SM in bytes.
Sourcepub const fn register_alloc_granularity(self) -> u32
pub const fn register_alloc_granularity(self) -> u32
Register allocation granularity (in warps).
Registers are allocated to warps in chunks of this many registers per thread, rounded up to the nearest multiple.
Shared memory allocation granularity in bytes.