gpu_kernel

Attribute Macro gpu_kernel 

Source
#[gpu_kernel]
Expand description

Attribute macro for defining multi-backend GPU kernels.

This macro generates code for multiple GPU backends with compile-time capability validation. It integrates with the ringkernel-ir crate to lower Rust DSL to backend-specific shader code.

§Attributes

  • backends = [cuda, metal, wgpu] - Target backends (default: all)
  • fallback = [cuda, metal, wgpu, cpu] - Fallback order for runtime selection
  • requires = [f64, atomic64] - Required capabilities (validated at compile time)
  • id = "kernel_name" - Explicit kernel identifier
  • block_size = 256 - Thread block size

§Example

use ringkernel_derive::gpu_kernel;

#[gpu_kernel(backends = [cuda, metal], requires = [subgroups])]
fn warp_reduce(data: &mut [f32], n: i32) {
    let idx = global_thread_id_x();
    if idx < n {
        // Use warp shuffle for reduction
        let val = data[idx as usize];
        let reduced = warp_reduce_sum(val);
        if lane_id() == 0 {
            data[idx as usize] = reduced;
        }
    }
}

§Capability Checking

The macro validates at compile time that all required capabilities are supported by at least one target backend:

CapabilityCUDAMetalWebGPUCPU
f64YesNoNoYes
i64YesYesNoYes
atomic64YesYesNo*Yes
cooperative_groupsYesNoNoYes
subgroupsYesYesOptYes
shared_memoryYesYesYesYes
f16YesYesYesYes

*WebGPU emulates 64-bit atomics with 32-bit pairs.

§Generated Code

For each compatible backend, the macro generates:

  • Backend-specific source code constant (e.g., KERNEL_NAME_CUDA_SOURCE)
  • Registration entry for runtime discovery
  • CPU fallback function (if cpu_fallback = true)