oxicuda-launch 0.1.1

OxiCUDA Launch - Type-safe GPU kernel launch infrastructure
Documentation
oxicuda-launch-0.1.1 has been yanked.

oxicuda-launch

Type-safe GPU kernel launch infrastructure for the OxiCUDA ecosystem.

Part of the OxiCUDA project.

Overview

oxicuda-launch provides ergonomic, type-safe abstractions for launching CUDA GPU kernels. It builds on top of oxicuda-driver and oxicuda-memory to turn raw cuLaunchKernel calls into safe, builder-pattern Rust code.

The Kernel struct wraps a Function handle and holds an Arc<Module> to ensure the PTX module remains loaded for the kernel's lifetime. Arguments are passed through the KernelArgs trait, which converts typed Rust values into the *mut c_void array that the driver expects. Tuple implementations are provided for up to 24 Copy elements, covering virtually all practical kernel signatures.

The launch! macro offers a concise syntax for kernel dispatch, while LaunchParams and its builder provide explicit control over grid size, block size, shared memory, and stream selection. The grid_size_for(n, block_size) helper computes the minimum number of blocks needed to cover a given number of work items.

Modules

Module Description
grid Dim3 struct, grid_size_for() helper
params LaunchParams and LaunchParamsBuilder (builder pattern)
kernel Kernel struct, KernelArgs trait (tuple impls up to 24)
macros launch! macro for ergonomic kernel dispatch

Quick Start

use std::sync::Arc;
use oxicuda_driver::{init, Device, Context, Module, Stream};
use oxicuda_launch::{Kernel, LaunchParams, Dim3, grid_size_for, launch};

init()?;
let dev = Device::get(0)?;
let ctx = Arc::new(Context::new(&dev)?);

// Load PTX and create a kernel.
let module = Arc::new(Module::from_ptx(ptx_source)?);
let kernel = Kernel::from_module(module, "vector_add")?;

// Configure launch dimensions.
let n: u32 = 1024;
let block_size = 256u32;
let grid = grid_size_for(n, block_size);
let stream = Stream::new(&ctx)?;

// Launch with the macro.
let (a_ptr, b_ptr, c_ptr) = (0u64, 0u64, 0u64);
launch!(kernel, grid(grid), block(block_size), &stream, &(a_ptr, b_ptr, c_ptr, n))?;
stream.synchronize()?;
# Ok::<(), oxicuda_driver::CudaError>(())

Dim3 Conversions

Dim3 supports convenient From conversions for common dimension patterns:

Input type Result
u32 Dim3 { x, y: 1, z: 1 }
(u32, u32) Dim3 { x, y, z: 1 }
(u32, u32, u32) Dim3 { x, y, z }

Features

Feature Description
gpu-tests Enable tests that require a physical GPU

Platform Support

Platform Status
Linux Full support (NVIDIA driver 525+)
Windows Full support (NVIDIA driver 525+)
macOS Compile only (UnsupportedPlatform at runtime)

License

Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)