Skip to main content

Crate baracuda_driver

Crate baracuda_driver 

Source
Expand description

Safe Rust wrappers for the CUDA Driver API.

This crate takes the raw FFI in baracuda_cuda_sys and dresses it up with RAII handles, typed memory, lifetime-checked slices, and a kernel launch builder. It deliberately does not hide the Driver-API model: contexts are explicit, modules are explicit, streams are explicit.

§Quickstart

use baracuda_driver::{Context, Device, DeviceBuffer, Module, Stream};

let device = Device::get(0)?;
let ctx = Context::new(&device)?;
let stream = Stream::new(&ctx)?;
let host_data: Vec<f32> = (0..1024).map(|i| i as f32).collect();
let device_data = DeviceBuffer::from_slice(&ctx, &host_data)?;
let mut back = vec![0.0f32; host_data.len()];
device_data.copy_to_host(&mut back)?;
stream.synchronize()?;
assert_eq!(host_data, back);

§Modules

Re-exports§

pub use array::Array;
pub use array::ArrayFormat;
pub use array::SurfaceObject;
pub use array::TextureAddressMode;
pub use array::TextureDesc;
pub use array::TextureFilterMode;
pub use array::TextureObject;
pub use context::Context;
pub use context::PrimaryContext;
pub use device::Device;
pub use error::error_name;
pub use error::error_string;
pub use error::Error;
pub use error::Result;
pub use event::Event;
pub use graph::instantiate_flags;
pub use graph::CaptureMode;
pub use graph::Graph;
pub use graph::GraphExec;
pub use graph::GraphNode;
pub use init::init;
pub use init::version;
pub use launch::Dim3;
pub use launch::LaunchBuilder;
pub use memory::mem_get_info;
pub use memory::DeviceBuffer;
pub use memory::DevicePtr;
pub use memory::DevicePtrMut;
pub use memory::DeviceSlice;
pub use memory::DeviceSliceMut;
pub use memory::ManagedAttach;
pub use memory::ManagedBuffer;
pub use memory::MemAdvise;
pub use pinned::PinnedBuffer;
pub use pinned::PinnedRegistration;
pub use module::Function;
pub use module::Module;
pub use stream::Stream;

Modules§

array
CUDA arrays + texture / surface objects.
context
CUDA contexts — both primary (shared with the Runtime API) and explicit.
coredump
CUDA GPU core-dump configuration (CUDA 12.1+).
device
Physical-GPU query and enumeration.
error
Error type for baracuda-driver.
event
CUDA events — lightweight synchronization objects you can record on a stream and later wait on, or use to measure elapsed device time.
external
External memory / semaphore interop — import buffers and sync primitives from Vulkan, D3D11, D3D12, NvSci, and OpaqueFd sources.
graph
CUDA Graphs — record a sequence of operations once, replay cheaply.
graphics
Graphics-API interop — register GL / D3D / VDPAU / EGL resources with CUDA for zero-copy compute on their memory.
green
Green contexts (CUDA 12.4+) — partition a GPU’s SMs into isolated “green” subsets that each run their own stream/kernel pipeline without contending for SMs with other green contexts on the same device.
init
Driver initialization helpers.
ipc
Inter-Process Communication for CUDA events and allocations.
launch
Kernel launch builder — the Rust equivalent of CUDA C’s triple-chevron syntax.
launch_attr
Typed builders for CUlaunchAttribute entries consumed by crate::LaunchBuilder::launch_ex (CUDA 12.0+).
library
Driver-API library + kernel management (CUDA 12.0+).
memcpy2d
Strided (2-D) memory copies via cuMemcpy2D, and pitched device allocations via cuMemAllocPitch.
memcpy3d
3-D arrays + 3-D memcpy + mipmapped arrays.
memory
Device-memory types.
mempool
Stream-ordered memory pools (CUDA 11.2+).
module
Compiled module loading (PTX, CUBIN, fatbin) and kernel entry-point lookup.
multicast
Multicast objects (CUDA 12.0+, NVSwitch systems only).
occupancy
Occupancy calculators — how many blocks can a kernel fit per SM, and what block size maximizes utilization.
pinned
Pinned (page-locked) host memory.
pointer
Pointer attribute queries (cuPointerGetAttribute).
profiler
Thin wrappers over cuProfilerStart / cuProfilerStop. These tell external profilers (Nsight, CUPTI) which sections of a program are interesting; they do not produce output themselves.
stream
CUDA streams — ordered queues of work on a device.
tensor_map
Hopper Tensor Memory Accelerator (TMA) descriptors.
user_object
CUDA Graph user objects (CUDA 12.0+).
vmm
CUDA Virtual Memory Management (VMM) — the fine-grained alternative to cuMemAlloc.