Skip to main content

Module dynamic_parallelism

Module dynamic_parallelism 

Source
Expand description

Dynamic parallelism support for device-side kernel launches.

CUDA dynamic parallelism allows kernels running on the GPU to launch child kernels without returning to the host. This module provides configuration, planning, and PTX code generation for nested kernel launches.

§Architecture requirements

Dynamic parallelism requires compute capability 3.5+ (sm_35). All SmVersion variants in this crate are sm_75+, so they all support dynamic parallelism.

§CUDA nesting limits

  • Maximum nesting depth: 24
  • Default pending launch limit: 2048
  • Each pending launch consumes device memory for bookkeeping

§Example

use oxicuda_launch::dynamic_parallelism::{
    DynamicParallelismConfig, ChildKernelSpec, GridSpec,
    validate_dynamic_config, plan_dynamic_launch,
    generate_child_launch_ptx, generate_device_sync_ptx,
    estimate_launch_overhead, max_nesting_for_sm,
};
use oxicuda_launch::Dim3;
use oxicuda_ptx::arch::SmVersion;
use oxicuda_ptx::PtxType;

let config = DynamicParallelismConfig {
    max_nesting_depth: 4,
    max_pending_launches: 2048,
    sync_depth: 2,
    child_grid: Dim3::x(128),
    child_block: Dim3::x(256),
    child_shared_mem: 0,
    sm_version: SmVersion::Sm80,
};

validate_dynamic_config(&config).ok();
let plan = plan_dynamic_launch(&config).ok();

let child = ChildKernelSpec {
    name: "child_kernel".to_string(),
    param_types: vec![PtxType::U64, PtxType::U32],
    grid_dim: GridSpec::Fixed(Dim3::x(128)),
    block_dim: Dim3::x(256),
    shared_mem_bytes: 0,
};

let ptx = generate_child_launch_ptx("parent_kernel", &child, SmVersion::Sm80);
let sync_ptx = generate_device_sync_ptx(SmVersion::Sm80);
let overhead = estimate_launch_overhead(4, 2048);
let max_depth = max_nesting_for_sm(SmVersion::Sm80);

Structs§

ChildKernelSpec
Specification for a child kernel to be launched from device code.
DynamicLaunchPlan
A validated plan for a dynamic (device-side) kernel launch.
DynamicParallelismConfig
Configuration for dynamic parallelism (device-side kernel launches).

Enums§

GridSpec
Specifies how child kernel grid dimensions are determined.

Functions§

estimate_launch_overhead
Estimates the device memory overhead for dynamic parallelism in bytes.
generate_child_launch_ptx
Generates PTX code for a device-side child kernel launch.
generate_device_sync_ptx
Generates PTX code for device-side synchronization.
max_nesting_for_sm
Returns the maximum supported nesting depth for a given SM version.
plan_dynamic_launch
Creates a validated launch plan from a dynamic parallelism configuration.
validate_dynamic_config
Validates a dynamic parallelism configuration.