Crate oxicuda_launch

Expand description

§OxiCUDA Launch

Type-safe GPU kernel launch infrastructure for the OxiCUDA ecosystem.

This crate provides ergonomic, type-safe abstractions for launching CUDA GPU kernels. It builds on top of oxicuda_driver to offer:

Dim3 — 3-dimensional grid and block size specification with convenient conversions from u32, (u32, u32), and (u32, u32, u32).
LaunchParams — kernel launch configuration (grid, block, shared memory) with a builder pattern via LaunchParamsBuilder.
Kernel — a launchable kernel wrapper that manages module lifetime via Arc<Module> and provides occupancy query delegation.
KernelArgs — a trait for type-safe kernel argument passing, implemented for tuples of Copy types up to 24 elements.
launch! — a convenience macro for concise kernel launches.
grid_size_for — a helper to compute the minimum grid size needed to cover a given number of work items.

§Quick start

use std::sync::Arc;
use oxicuda_driver::{init, Device, Context, Module, Stream};
use oxicuda_launch::{Kernel, LaunchParams, Dim3, grid_size_for, launch};

init()?;
let dev = Device::get(0)?;
let ctx = Arc::new(Context::new(&dev)?);

// Load PTX and create a kernel.
let ptx = ""; // In practice, use include_str! or load from file.
let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "vector_add")?;

// Configure launch dimensions.
let n: u32 = 1024;
let block_size = 256u32;
let grid = grid_size_for(n, block_size);

let stream = Stream::new(&ctx)?;

// Launch with the macro.
let (a_ptr, b_ptr, c_ptr) = (0u64, 0u64, 0u64);
launch!(kernel, grid(grid), block(block_size), &stream, &(a_ptr, b_ptr, c_ptr, n))?;

stream.synchronize()?;

§Crate features

Feature	Description
`gpu-tests`	Enable tests that require a physical GPU

Re-exports§

pub use arg_serialize::ArgType;
pub use arg_serialize::LaunchLog;
pub use arg_serialize::LaunchLogger;
pub use arg_serialize::LaunchSummary;
pub use arg_serialize::SerializableKernelArgs;
pub use arg_serialize::SerializedArg;
pub use async_launch::AsyncKernel;
pub use async_launch::AsyncLaunchConfig;
pub use async_launch::CompletionStatus;
pub use async_launch::ErasedKernelArgs;
pub use async_launch::LaunchCompletion;
pub use async_launch::LaunchTiming;
pub use async_launch::PollStrategy;
pub use async_launch::TimedLaunchCompletion;
pub use async_launch::multi_launch_async;
pub use cluster::ClusterDim;
pub use cluster::ClusterLaunchParams;
pub use cluster::cluster_launch;
pub use cooperative::CooperativeLaunch;
pub use dynamic_parallelism::ChildKernelSpec;
pub use dynamic_parallelism::DynamicLaunchPlan;
pub use dynamic_parallelism::DynamicParallelismConfig;
pub use dynamic_parallelism::GridSpec;
pub use error::LaunchError;
pub use graph_launch::GraphLaunchCapture;
pub use graph_launch::LaunchRecord;
pub use grid::Dim3;
pub use grid::auto_grid_2d;
pub use grid::auto_grid_for;
pub use grid::grid_size_for;
pub use kernel::Kernel;
pub use kernel::KernelArgs;
pub use multi_stream::multi_stream_launch;
pub use multi_stream::multi_stream_launch_uniform;
pub use named_args::ArgBuilder;
pub use named_args::NamedKernelArgs;
pub use params::LaunchParams;
pub use params::LaunchParamsBuilder;
pub use telemetry::KernelStats;
pub use telemetry::LaunchTelemetry;
pub use telemetry::TelemetryCollector;
pub use telemetry::TelemetryExporter;
pub use telemetry::TelemetrySummary;
pub use telemetry::estimate_occupancy;
pub use trace::KernelSpanGuard;

Modules§

arg_serialize: Kernel argument serialization, Debug/Display formatting, and launch logging.
async_launch: Async kernel launch with completion futures.
cluster: Thread block cluster configuration for Hopper+ GPUs (SM 9.0+).
cooperative: Cooperative kernel launch support.
dynamic_parallelism: Dynamic parallelism support for device-side kernel launches.
error: Launch validation error types.
graph_launch: Graph-based kernel launch capture and replay.
grid: Grid and block dimension types for kernel launch configuration.
kernel: Type-safe GPU kernel management and argument passing.
macros: Convenience macros for kernel launching.
multi_stream: Multi-stream kernel launch support.
named_args: Named kernel arguments for enhanced debuggability and type safety.
params: Kernel launch parameter configuration.
prelude: Convenient glob import for common OxiCUDA Launch types.
telemetry: Launch telemetry: timing, occupancy, and register usage reporting.
trace: Launch telemetry / tracing integration.

Macros§

kernel_launch_span: No-op version used when the tracing feature is disabled.
launch: Launch a GPU kernel with a concise syntax.
launch_named: Launch a GPU kernel with named argument syntax.
named_args: Build a kernel argument tuple from named fields.

Crate oxicuda_launch

Crate oxicuda_launch Copy item path

§OxiCUDA Launch

§Quick start

§Crate features

Re-exports§

Modules§

Macros§

Crate oxicuda_launch