Expand description
§OxiCUDA Launch
Type-safe GPU kernel launch infrastructure for the OxiCUDA ecosystem.
This crate provides ergonomic, type-safe abstractions for launching CUDA
GPU kernels. It builds on top of oxicuda_driver to offer:
-
Dim3— 3-dimensional grid and block size specification with convenient conversions fromu32,(u32, u32), and(u32, u32, u32). -
LaunchParams— kernel launch configuration (grid, block, shared memory) with a builder pattern viaLaunchParamsBuilder. -
Kernel— a launchable kernel wrapper that manages module lifetime viaArc<Module>and provides occupancy query delegation. -
KernelArgs— a trait for type-safe kernel argument passing, implemented for tuples ofCopytypes up to 24 elements. -
launch!— a convenience macro for concise kernel launches. -
grid_size_for— a helper to compute the minimum grid size needed to cover a given number of work items.
§Quick start
use std::sync::Arc;
use oxicuda_driver::{init, Device, Context, Module, Stream};
use oxicuda_launch::{Kernel, LaunchParams, Dim3, grid_size_for, launch};
init()?;
let dev = Device::get(0)?;
let ctx = Arc::new(Context::new(&dev)?);
// Load PTX and create a kernel.
let ptx = ""; // In practice, use include_str! or load from file.
let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "vector_add")?;
// Configure launch dimensions.
let n: u32 = 1024;
let block_size = 256u32;
let grid = grid_size_for(n, block_size);
let stream = Stream::new(&ctx)?;
// Launch with the macro.
let (a_ptr, b_ptr, c_ptr) = (0u64, 0u64, 0u64);
launch!(kernel, grid(grid), block(block_size), &stream, &(a_ptr, b_ptr, c_ptr, n))?;
stream.synchronize()?;§Crate features
| Feature | Description |
|---|---|
gpu-tests | Enable tests that require a physical GPU |
Re-exports§
pub use arg_serialize::ArgType;pub use arg_serialize::LaunchLog;pub use arg_serialize::LaunchLogger;pub use arg_serialize::LaunchSummary;pub use arg_serialize::SerializableKernelArgs;pub use arg_serialize::SerializedArg;pub use async_launch::AsyncKernel;pub use async_launch::AsyncLaunchConfig;pub use async_launch::CompletionStatus;pub use async_launch::ErasedKernelArgs;pub use async_launch::LaunchCompletion;pub use async_launch::LaunchTiming;pub use async_launch::PollStrategy;pub use async_launch::TimedLaunchCompletion;pub use async_launch::multi_launch_async;pub use cluster::ClusterDim;pub use cluster::ClusterLaunchParams;pub use cluster::cluster_launch;pub use cooperative::CooperativeLaunch;pub use dynamic_parallelism::ChildKernelSpec;pub use dynamic_parallelism::DynamicLaunchPlan;pub use dynamic_parallelism::DynamicParallelismConfig;pub use dynamic_parallelism::GridSpec;pub use error::LaunchError;pub use graph_launch::GraphLaunchCapture;pub use graph_launch::LaunchRecord;pub use grid::Dim3;pub use grid::auto_grid_2d;pub use grid::auto_grid_for;pub use grid::grid_size_for;pub use kernel::Kernel;pub use kernel::KernelArgs;pub use multi_stream::multi_stream_launch;pub use multi_stream::multi_stream_launch_uniform;pub use named_args::ArgBuilder;pub use named_args::NamedKernelArgs;pub use params::LaunchParams;pub use params::LaunchParamsBuilder;pub use telemetry::KernelStats;pub use telemetry::LaunchTelemetry;pub use telemetry::TelemetryCollector;pub use telemetry::TelemetryExporter;pub use telemetry::TelemetrySummary;pub use telemetry::estimate_occupancy;pub use trace::KernelSpanGuard;
Modules§
- arg_
serialize - Kernel argument serialization, Debug/Display formatting, and launch logging.
- async_
launch - Async kernel launch with completion futures.
- cluster
- Thread block cluster configuration for Hopper+ GPUs (SM 9.0+).
- cooperative
- Cooperative kernel launch support.
- dynamic_
parallelism - Dynamic parallelism support for device-side kernel launches.
- error
- Launch validation error types.
- graph_
launch - Graph-based kernel launch capture and replay.
- grid
- Grid and block dimension types for kernel launch configuration.
- kernel
- Type-safe GPU kernel management and argument passing.
- macros
- Convenience macros for kernel launching.
- multi_
stream - Multi-stream kernel launch support.
- named_
args - Named kernel arguments for enhanced debuggability and type safety.
- params
- Kernel launch parameter configuration.
- prelude
- Convenient glob import for common OxiCUDA Launch types.
- telemetry
- Launch telemetry: timing, occupancy, and register usage reporting.
- trace
- Launch telemetry / tracing integration.
Macros§
- kernel_
launch_ span - No-op version used when the
tracingfeature is disabled. - launch
- Launch a GPU kernel with a concise syntax.
- launch_
named - Launch a GPU kernel with named argument syntax.
- named_
args - Build a kernel argument tuple from named fields.