Attribute Macro gpu_kernel

Source

#[gpu_kernel]

Expand description

Attribute macro for defining multi-backend GPU kernels.

This macro generates code for multiple GPU backends with compile-time capability validation. It integrates with the ringkernel-ir crate to lower Rust DSL to backend-specific shader code.

§Attributes

backends = [cuda, metal, wgpu] - Target backends (default: all)
fallback = [cuda, metal, wgpu, cpu] - Fallback order for runtime selection
requires = [f64, atomic64] - Required capabilities (validated at compile time)
id = "kernel_name" - Explicit kernel identifier
block_size = 256 - Thread block size

§Example

use ringkernel_derive::gpu_kernel;

#[gpu_kernel(backends = [cuda, metal], requires = [subgroups])]
fn warp_reduce(data: &mut [f32], n: i32) {
    let idx = global_thread_id_x();
    if idx < n {
        // Use warp shuffle for reduction
        let val = data[idx as usize];
        let reduced = warp_reduce_sum(val);
        if lane_id() == 0 {
            data[idx as usize] = reduced;
        }
    }
}

§Capability Checking

The macro validates at compile time that all required capabilities are supported by at least one target backend:

Capability	CUDA	Metal	WebGPU	CPU
f64	Yes	No	No	Yes
i64	Yes	Yes	No	Yes
atomic64	Yes	Yes	No*	Yes
cooperative_groups	Yes	No	No	Yes
subgroups	Yes	Yes	Opt	Yes
shared_memory	Yes	Yes	Yes	Yes
f16	Yes	Yes	Yes	Yes

*WebGPU emulates 64-bit atomics with 32-bit pairs.

§Generated Code

For each compatible backend, the macro generates:

Backend-specific source code constant (e.g., KERNEL_NAME_CUDA_SOURCE)
Registration entry for runtime discovery
CPU fallback function (if cpu_fallback = true)

gpu_kernel

Attribute Macro gpu_kernel Copy item path

§Attributes

§Example

§Capability Checking

§Generated Code

Attribute Macro gpu_kernel