Crate iro_cuda_ffi

Expand description

§IRO CUDA FFI (iro-cuda-ffi) v1

A minimal, rigid ABI boundary that lets Rust orchestrate nvcc-compiled CUDA C++ kernels with no performance penalty vs pure C++.

§Design Philosophy

nvcc produces device code. iro-cuda-ffi never competes with nvcc.
Rust owns host orchestration. Ownership, lifetimes, ordering, and errors are Rust responsibilities.
FFI is constrained. The ABI boundary is small, stable, and verifiable.
Patterns are mechanical. Humans and AI can generate wrappers safely via deterministic rules.

§Core Guarantees

No hidden device synchronization: Kernel launches never implicitly synchronize streams.
No implicit stream dependencies: You control all ordering via streams and events.
Typed transfer boundary: Host↔device copies are gated by IcffiPod for safety.
ABI verification: Layout asserts on both Rust and C++ sides catch mismatches at compile time.

§CUDA Version Requirements

iro-cuda-ffi requires CUDA 12.0 or later. CUDA Graph features use runtime APIs introduced in CUDA 11.4–12.0; linking against older runtimes will fail.

§Quick Start

use iro_cuda_ffi::prelude::*;

// Create a non-blocking stream
let stream = Stream::new()?;

// Allocate and initialize device memory (safe sync variant)
let input = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0, 4.0])?;
let mut output = DeviceBuffer::<f32>::zeros(4)?;

// Launch your kernel (extern "C" fn icffi_my_kernel(...) -> i32)
let blocks = (input.len() as u32 + 255) / 256;
let params = LaunchParams::new_1d(blocks, 256, stream.raw());
check(unsafe { icffi_my_kernel(params, input.as_in(), output.as_out()) })?;

// Read results (synchronizes automatically)
let results = output.to_vec(&stream)?;

Re-exports§

pub use prelude::*;

Modules§

abi: ABI types for the iro-cuda-ffi kernel interface.
device: CUDA device management.
error: Error handling for iro-cuda-ffi.
event: CUDA event primitives for synchronization and timing.
graph: CUDA graph capture and execution.
host_memory: Pinned host memory management.
memory: Device memory management.
pod: Plain Old Data (POD) traits for safe host↔device transfers.
prelude: Convenient re-exports for common iro-cuda-ffi usage.
stream: CUDA stream primitives for work ordering.
transfer: Async transfer guards for memory-safe DMA operations.

Crate iro_cuda_ffi

Crate iro_cuda_ffi Copy item path

§IRO CUDA FFI (iro-cuda-ffi) v1

§Design Philosophy

§Core Guarantees

§CUDA Version Requirements

§Quick Start

Re-exports§

Modules§

Crate iro_cuda_ffi