Expand description
Safe Rust wrappers for the CUDA Driver API.
This crate takes the raw FFI in baracuda_cuda_sys and dresses it up
with RAII handles, typed memory, lifetime-checked slices, and a kernel
launch builder. It deliberately does not hide the Driver-API model:
contexts are explicit, modules are explicit, streams are explicit.
§Quickstart
use baracuda_driver::{Context, Device, DeviceBuffer, Module, Stream};
let device = Device::get(0)?;
let ctx = Context::new(&device)?;
let stream = Stream::new(&ctx)?;
let host_data: Vec<f32> = (0..1024).map(|i| i as f32).collect();
let device_data = DeviceBuffer::from_slice(&ctx, &host_data)?;
let mut back = vec![0.0f32; host_data.len()];
device_data.copy_to_host(&mut back)?;
stream.synchronize()?;
assert_eq!(host_data, back);§Modules
device—Deviceenumeration and attributes.context—Context(explicit CUDA contexts + primary-context reuse).stream—Stream, ordered async work queues.event—Event, synchronization and timing.memory—DeviceBuffer<T>,DeviceSlice<'_, T>,DeviceSliceMut<'_, T>.module—Module,Function(PTX/CUBIN loading).launch—launch::LaunchBuilderforcuLaunchKernel.init()—init()helper and driver version queries.
Re-exports§
pub use array::Array;pub use array::ArrayFormat;pub use array::SurfaceObject;pub use array::TextureAddressMode;pub use array::TextureDesc;pub use array::TextureFilterMode;pub use array::TextureObject;pub use context::Context;pub use context::PrimaryContext;pub use device::Device;pub use error::error_name;pub use error::error_string;pub use error::Error;pub use error::Result;pub use event::Event;pub use graph::instantiate_flags;pub use graph::CaptureMode;pub use graph::Graph;pub use graph::GraphExec;pub use graph::GraphNode;pub use init::init;pub use init::version;pub use launch::Dim3;pub use launch::LaunchBuilder;pub use memory::mem_get_info;pub use memory::DeviceBuffer;pub use memory::DevicePtr;pub use memory::DevicePtrMut;pub use memory::DeviceSlice;pub use memory::DeviceSliceMut;pub use memory::ManagedAttach;pub use memory::ManagedBuffer;pub use memory::MemAdvise;pub use pinned::PinnedBuffer;pub use pinned::PinnedRegistration;pub use module::Function;pub use module::Module;pub use stream::Stream;
Modules§
- array
- CUDA arrays + texture / surface objects.
- context
- CUDA contexts — both primary (shared with the Runtime API) and explicit.
- coredump
- CUDA GPU core-dump configuration (CUDA 12.1+).
- device
- Physical-GPU query and enumeration.
- error
- Error type for
baracuda-driver. - event
- CUDA events — lightweight synchronization objects you can record on a stream and later wait on, or use to measure elapsed device time.
- external
- External memory / semaphore interop — import buffers and sync primitives from Vulkan, D3D11, D3D12, NvSci, and OpaqueFd sources.
- graph
- CUDA Graphs — record a sequence of operations once, replay cheaply.
- graphics
- Graphics-API interop — register GL / D3D / VDPAU / EGL resources with CUDA for zero-copy compute on their memory.
- green
- Green contexts (CUDA 12.4+) — partition a GPU’s SMs into isolated “green” subsets that each run their own stream/kernel pipeline without contending for SMs with other green contexts on the same device.
- init
- Driver initialization helpers.
- ipc
- Inter-Process Communication for CUDA events and allocations.
- launch
- Kernel launch builder — the Rust equivalent of CUDA C’s triple-chevron syntax.
- launch_
attr - Typed builders for
CUlaunchAttributeentries consumed bycrate::LaunchBuilder::launch_ex(CUDA 12.0+). - library
- Driver-API library + kernel management (CUDA 12.0+).
- memcpy2d
- Strided (2-D) memory copies via
cuMemcpy2D, and pitched device allocations viacuMemAllocPitch. - memcpy3d
- 3-D arrays + 3-D memcpy + mipmapped arrays.
- memory
- Device-memory types.
- mempool
- Stream-ordered memory pools (CUDA 11.2+).
- module
- Compiled module loading (PTX, CUBIN, fatbin) and kernel entry-point lookup.
- multicast
- Multicast objects (CUDA 12.0+, NVSwitch systems only).
- occupancy
- Occupancy calculators — how many blocks can a kernel fit per SM, and what block size maximizes utilization.
- pinned
- Pinned (page-locked) host memory.
- pointer
- Pointer attribute queries (
cuPointerGetAttribute). - profiler
- Thin wrappers over
cuProfilerStart/cuProfilerStop. These tell external profilers (Nsight, CUPTI) which sections of a program are interesting; they do not produce output themselves. - stream
- CUDA streams — ordered queues of work on a device.
- tensor_
map - Hopper Tensor Memory Accelerator (TMA) descriptors.
- user_
object - CUDA Graph user objects (CUDA 12.0+).
- vmm
- CUDA Virtual Memory Management (VMM) — the fine-grained alternative to
cuMemAlloc.