Crate baracuda_driver

Expand description

Safe Rust wrappers for the CUDA Driver API.

This crate takes the raw FFI in baracuda_cuda_sys and dresses it up with RAII handles, typed memory, lifetime-checked slices, and a kernel launch builder. It deliberately does not hide the Driver-API model: contexts are explicit, modules are explicit, streams are explicit.

§Quickstart

use baracuda_driver::{Context, Device, DeviceBuffer, Module, Stream};

let device = Device::get(0)?;
let ctx = Context::new(&device)?;
let stream = Stream::new(&ctx)?;
let host_data: Vec<f32> = (0..1024).map(|i| i as f32).collect();
let device_data = DeviceBuffer::from_slice(&ctx, &host_data)?;
let mut back = vec![0.0f32; host_data.len()];
device_data.copy_to_host(&mut back)?;
stream.synchronize()?;
assert_eq!(host_data, back);

§Modules

device — Device enumeration and attributes.
context — Context (explicit CUDA contexts + primary-context reuse).
stream — Stream, ordered async work queues.
event — Event, synchronization and timing.
memory — DeviceBuffer<T>, DeviceSlice<'_, T>, DeviceSliceMut<'_, T>.
module — Module, Function (PTX/CUBIN loading).
launch — launch::LaunchBuilder for cuLaunchKernel.
init() — init() helper and driver version queries.

Re-exports§

pub use array::Array;
pub use array::ArrayFormat;
pub use array::SurfaceObject;
pub use array::TextureAddressMode;
pub use array::TextureDesc;
pub use array::TextureFilterMode;
pub use array::TextureObject;
pub use context::Context;
pub use context::PrimaryContext;
pub use device::Device;
pub use error::error_name;
pub use error::error_string;
pub use error::Error;
pub use error::Result;
pub use event::Event;
pub use graph::instantiate_flags;
pub use graph::CaptureMode;
pub use graph::Graph;
pub use graph::GraphExec;
pub use graph::GraphNode;
pub use init::init;
pub use init::version;
pub use launch::Dim3;
pub use launch::LaunchBuilder;
pub use memory::mem_get_info;
pub use memory::DeviceBuffer;
pub use memory::DevicePtr;
pub use memory::DevicePtrMut;
pub use memory::DeviceSlice;
pub use memory::DeviceSliceMut;
pub use memory::ManagedAttach;
pub use memory::ManagedBuffer;
pub use memory::MemAdvise;
pub use pinned::PinnedBuffer;
pub use pinned::PinnedRegistration;
pub use module::Function;
pub use module::Module;
pub use stream::Stream;

Modules§

array: CUDA arrays + texture / surface objects.
context: CUDA contexts — both primary (shared with the Runtime API) and explicit.
coredump: CUDA GPU core-dump configuration (CUDA 12.1+).
device: Physical-GPU query and enumeration.
error: Error type for baracuda-driver.
event: CUDA events — lightweight synchronization objects you can record on a stream and later wait on, or use to measure elapsed device time.
external: External memory / semaphore interop — import buffers and sync primitives from Vulkan, D3D11, D3D12, NvSci, and OpaqueFd sources.
graph: CUDA Graphs — record a sequence of operations once, replay cheaply.
graphics: Graphics-API interop — register GL / D3D / VDPAU / EGL resources with CUDA for zero-copy compute on their memory.
green: Green contexts (CUDA 12.4+) — partition a GPU’s SMs into isolated “green” subsets that each run their own stream/kernel pipeline without contending for SMs with other green contexts on the same device.
init: Driver initialization helpers.
ipc: Inter-Process Communication for CUDA events and allocations.
launch: Kernel launch builder — the Rust equivalent of CUDA C’s triple-chevron syntax.
launch_attr: Typed builders for CUlaunchAttribute entries consumed by crate::LaunchBuilder::launch_ex (CUDA 12.0+).
library: Driver-API library + kernel management (CUDA 12.0+).
memcpy2d: Strided (2-D) memory copies via cuMemcpy2D, and pitched device allocations via cuMemAllocPitch.
memcpy3d: 3-D arrays + 3-D memcpy + mipmapped arrays.
memory: Device-memory types.
mempool: Stream-ordered memory pools (CUDA 11.2+).
module: Compiled module loading (PTX, CUBIN, fatbin) and kernel entry-point lookup.
multicast: Multicast objects (CUDA 12.0+, NVSwitch systems only).
occupancy: Occupancy calculators — how many blocks can a kernel fit per SM, and what block size maximizes utilization.
pinned: Pinned (page-locked) host memory.
pointer: Pointer attribute queries (cuPointerGetAttribute).
profiler: Thin wrappers over cuProfilerStart / cuProfilerStop. These tell external profilers (Nsight, CUPTI) which sections of a program are interesting; they do not produce output themselves.
stream: CUDA streams — ordered queues of work on a device.
tensor_map: Hopper Tensor Memory Accelerator (TMA) descriptors.
user_object: CUDA Graph user objects (CUDA 12.0+).
vmm: CUDA Virtual Memory Management (VMM) — the fine-grained alternative to cuMemAlloc.

Crate baracuda_driver

Crate baracuda_driver Copy item path

§Quickstart

§Modules

Re-exports§

Modules§

Crate baracuda_driver