Skip to main content

Crate oxicuda_launch

Crate oxicuda_launch 

Source
Expand description

§OxiCUDA Launch

Type-safe GPU kernel launch infrastructure for the OxiCUDA ecosystem.

This crate provides ergonomic, type-safe abstractions for launching CUDA GPU kernels. It builds on top of oxicuda_driver to offer:

  • Dim3 — 3-dimensional grid and block size specification with convenient conversions from u32, (u32, u32), and (u32, u32, u32).

  • LaunchParams — kernel launch configuration (grid, block, shared memory) with a builder pattern via LaunchParamsBuilder.

  • Kernel — a launchable kernel wrapper that manages module lifetime via Arc<Module> and provides occupancy query delegation.

  • KernelArgs — a trait for type-safe kernel argument passing, implemented for tuples of Copy types up to 24 elements.

  • launch! — a convenience macro for concise kernel launches.

  • grid_size_for — a helper to compute the minimum grid size needed to cover a given number of work items.

§Quick start

use std::sync::Arc;
use oxicuda_driver::{init, Device, Context, Module, Stream};
use oxicuda_launch::{Kernel, LaunchParams, Dim3, grid_size_for, launch};

init()?;
let dev = Device::get(0)?;
let ctx = Arc::new(Context::new(&dev)?);

// Load PTX and create a kernel.
let ptx = ""; // In practice, use include_str! or load from file.
let module = Arc::new(Module::from_ptx(ptx)?);
let kernel = Kernel::from_module(module, "vector_add")?;

// Configure launch dimensions.
let n: u32 = 1024;
let block_size = 256u32;
let grid = grid_size_for(n, block_size);

let stream = Stream::new(&ctx)?;

// Launch with the macro.
let (a_ptr, b_ptr, c_ptr) = (0u64, 0u64, 0u64);
launch!(kernel, grid(grid), block(block_size), &stream, &(a_ptr, b_ptr, c_ptr, n))?;

stream.synchronize()?;

§Crate features

FeatureDescription
gpu-testsEnable tests that require a physical GPU

Re-exports§

pub use arg_serialize::ArgType;
pub use arg_serialize::LaunchLog;
pub use arg_serialize::LaunchLogger;
pub use arg_serialize::LaunchSummary;
pub use arg_serialize::SerializableKernelArgs;
pub use arg_serialize::SerializedArg;
pub use async_launch::AsyncKernel;
pub use async_launch::AsyncLaunchConfig;
pub use async_launch::CompletionStatus;
pub use async_launch::ErasedKernelArgs;
pub use async_launch::LaunchCompletion;
pub use async_launch::LaunchTiming;
pub use async_launch::PollStrategy;
pub use async_launch::TimedLaunchCompletion;
pub use async_launch::multi_launch_async;
pub use cluster::ClusterDim;
pub use cluster::ClusterLaunchParams;
pub use cluster::cluster_launch;
pub use cooperative::CooperativeLaunch;
pub use dynamic_parallelism::ChildKernelSpec;
pub use dynamic_parallelism::DynamicLaunchPlan;
pub use dynamic_parallelism::DynamicParallelismConfig;
pub use dynamic_parallelism::GridSpec;
pub use error::LaunchError;
pub use graph_launch::GraphLaunchCapture;
pub use graph_launch::LaunchRecord;
pub use grid::Dim3;
pub use grid::auto_grid_2d;
pub use grid::auto_grid_for;
pub use grid::grid_size_for;
pub use kernel::Kernel;
pub use kernel::KernelArgs;
pub use multi_stream::multi_stream_launch;
pub use multi_stream::multi_stream_launch_uniform;
pub use named_args::ArgBuilder;
pub use named_args::NamedKernelArgs;
pub use params::LaunchParams;
pub use params::LaunchParamsBuilder;
pub use telemetry::KernelStats;
pub use telemetry::LaunchTelemetry;
pub use telemetry::TelemetryCollector;
pub use telemetry::TelemetryExporter;
pub use telemetry::TelemetrySummary;
pub use telemetry::estimate_occupancy;
pub use trace::KernelSpanGuard;

Modules§

arg_serialize
Kernel argument serialization, Debug/Display formatting, and launch logging.
async_launch
Async kernel launch with completion futures.
cluster
Thread block cluster configuration for Hopper+ GPUs (SM 9.0+).
cooperative
Cooperative kernel launch support.
dynamic_parallelism
Dynamic parallelism support for device-side kernel launches.
error
Launch validation error types.
graph_launch
Graph-based kernel launch capture and replay.
grid
Grid and block dimension types for kernel launch configuration.
kernel
Type-safe GPU kernel management and argument passing.
macros
Convenience macros for kernel launching.
multi_stream
Multi-stream kernel launch support.
named_args
Named kernel arguments for enhanced debuggability and type safety.
params
Kernel launch parameter configuration.
prelude
Convenient glob import for common OxiCUDA Launch types.
telemetry
Launch telemetry: timing, occupancy, and register usage reporting.
trace
Launch telemetry / tracing integration.

Macros§

kernel_launch_span
No-op version used when the tracing feature is disabled.
launch
Launch a GPU kernel with a concise syntax.
launch_named
Launch a GPU kernel with named argument syntax.
named_args
Build a kernel argument tuple from named fields.