iro_cuda_ffi/lib.rs
1//! # IRO CUDA FFI (iro-cuda-ffi) v1
2//!
3//! A minimal, rigid ABI boundary that lets Rust orchestrate nvcc-compiled CUDA C++ kernels
4//! with no performance penalty vs pure C++.
5//!
6//! ## Design Philosophy
7//!
8//! 1. **nvcc produces device code.** iro-cuda-ffi never competes with nvcc.
9//! 2. **Rust owns host orchestration.** Ownership, lifetimes, ordering, and errors are Rust responsibilities.
10//! 3. **FFI is constrained.** The ABI boundary is small, stable, and verifiable.
11//! 4. **Patterns are mechanical.** Humans and AI can generate wrappers safely via deterministic rules.
12//!
13//! ## Core Guarantees
14//!
15//! - **No hidden device synchronization**: Kernel launches never implicitly synchronize streams.
16//! - **No implicit stream dependencies**: You control all ordering via streams and events.
17//! - **Typed transfer boundary**: Host↔device copies are gated by `IcffiPod` for safety.
18//! - **ABI verification**: Layout asserts on both Rust and C++ sides catch mismatches at compile time.
19//!
20//! ## CUDA Version Requirements
21//!
22//! iro-cuda-ffi requires **CUDA 12.0 or later**. CUDA Graph features use runtime APIs
23//! introduced in CUDA 11.4–12.0; linking against older runtimes will fail.
24//!
25//! ## Quick Start
26//!
27//! ```ignore
28//! use iro_cuda_ffi::prelude::*;
29//!
30//! // Create a non-blocking stream
31//! let stream = Stream::new()?;
32//!
33//! // Allocate and initialize device memory (safe sync variant)
34//! let input = DeviceBuffer::from_slice_sync(&stream, &[1.0f32, 2.0, 3.0, 4.0])?;
35//! let mut output = DeviceBuffer::<f32>::zeros(4)?;
36//!
37//! // Launch your kernel (extern "C" fn icffi_my_kernel(...) -> i32)
38//! let blocks = (input.len() as u32 + 255) / 256;
39//! let params = LaunchParams::new_1d(blocks, 256, stream.raw());
40//! check(unsafe { icffi_my_kernel(params, input.as_in(), output.as_out()) })?;
41//!
42//! // Read results (synchronizes automatically)
43//! let results = output.to_vec(&stream)?;
44//! ```
45
46#![warn(missing_docs)]
47#![warn(clippy::all, clippy::pedantic, clippy::nursery)]
48#![allow(clippy::module_name_repetitions)]
49
50#[cfg(not(target_pointer_width = "64"))]
51compile_error!("iro-cuda-ffi requires a 64-bit target.");
52
53extern crate alloc;
54
55pub mod abi;
56pub mod device;
57pub mod error;
58pub mod event;
59pub mod graph;
60pub mod host_memory;
61pub mod memory;
62pub mod pod;
63pub mod prelude;
64pub mod stream;
65pub mod transfer;
66
67mod sys;
68
69// Re-export prelude at crate root for convenience
70pub use prelude::*;
71
72#[cfg(test)]
73mod lib_test;