Expand description
§IRO CUDA FFI (iro-cuda-ffi) v1
A minimal, rigid ABI boundary that lets Rust orchestrate nvcc-compiled CUDA C++ kernels with no performance penalty vs pure C++.
§Design Philosophy
- nvcc produces device code. iro-cuda-ffi never competes with nvcc.
- Rust owns host orchestration. Ownership, lifetimes, ordering, and errors are Rust responsibilities.
- FFI is constrained. The ABI boundary is small, stable, and verifiable.
- Patterns are mechanical. Humans and AI can generate wrappers safely via deterministic rules.
§Core Guarantees
- No hidden device synchronization: Kernel launches never implicitly synchronize streams.
- No implicit stream dependencies: You control all ordering via streams and events.
- Typed transfer boundary: Host↔device copies are gated by
IcffiPodfor safety. - ABI verification: Layout asserts on both Rust and C++ sides catch mismatches at compile time.
§CUDA Version Requirements
iro-cuda-ffi requires CUDA 12.0 or later. CUDA Graph features use runtime APIs introduced in CUDA 11.4–12.0; linking against older runtimes will fail.
§Quick Start
ⓘ
use iro_cuda_ffi::prelude::*;
// Create a non-blocking stream
let stream = Stream::new()?;
// Allocate and initialize device memory (safe sync variant)
let input = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0, 4.0])?;
let mut output = DeviceBuffer::<f32>::zeros(4)?;
// Launch your kernel (extern "C" fn icffi_my_kernel(...) -> i32)
let blocks = (input.len() as u32 + 255) / 256;
let params = LaunchParams::new_1d(blocks, 256, stream.raw());
check(unsafe { icffi_my_kernel(params, input.as_in(), output.as_out()) })?;
// Read results (synchronizes automatically)
let results = output.to_vec(&stream)?;Re-exports§
pub use prelude::*;
Modules§
- abi
- ABI types for the iro-cuda-ffi kernel interface.
- device
- CUDA device management.
- error
- Error handling for iro-cuda-ffi.
- event
- CUDA event primitives for synchronization and timing.
- graph
- CUDA graph capture and execution.
- host_
memory - Pinned host memory management.
- memory
- Device memory management.
- pod
- Plain Old Data (POD) traits for safe host↔device transfers.
- prelude
- Convenient re-exports for common iro-cuda-ffi usage.
- stream
- CUDA stream primitives for work ordering.
- transfer
- Async transfer guards for memory-safe DMA operations.