oxicuda-memory

Type-safe GPU memory management with Rust ownership semantics.

Part of the OxiCUDA project.

Overview

oxicuda-memory provides safe, RAII-based wrappers around CUDA memory allocation and transfer operations. Every buffer type owns its GPU (or pinned-host) allocation and automatically frees it on Drop, preventing leaks without requiring manual cleanup.

The crate enforces compile-time type safety through generics (T: Copy) and validates sizes at runtime, returning CudaError::InvalidValue for mismatches rather than panicking. Drop implementations log errors via tracing::warn instead of panicking, ensuring safe teardown even when the CUDA context has already been destroyed.

The copy module provides freestanding transfer functions that mirror the CUDA driver cuMemcpy* family (copy_htod, copy_dtoh, copy_dtod) with both synchronous and async variants. For convenience, DeviceBuffer also exposes methods like copy_from_host() and copy_to_host() directly.

Modules

Module	Description
`device_buffer`	`DeviceBuffer<T>` (VRAM) and `DeviceSlice<T>` (sub-range)
`host_buffer`	`PinnedBuffer<T>` -- page-locked host memory for fast DMA
`unified`	`UnifiedBuffer<T>` -- CUDA managed memory (host+device)
`zero_copy`	`MappedBuffer<T>` -- zero-copy host-mapped memory
`copy`	Freestanding `copy_htod`, `copy_dtoh`, `copy_dtod` helpers
`pool`	`MemoryPool` -- stream-ordered allocation (behind `pool` feature)

Quick Start

use oxicuda_driver::prelude::*;
use oxicuda_memory::prelude::*;

init()?;
let dev = Device::get(0)?;
let _ctx = Context::new(&dev)?;

// Allocate a device buffer and upload host data.
let host_data = vec![1.0f32; 1024];
let mut gpu_buf = DeviceBuffer::<f32>::from_slice(&host_data)?;

// Download results back to the host.
let mut result = vec![0.0f32; 1024];
gpu_buf.copy_to_host(&mut result)?;
# Ok::<(), oxicuda_driver::CudaError>(())

Buffer Types

Type	Location	Description
`DeviceBuffer<T>`	Device (VRAM)	Primary GPU-side buffer
`DeviceSlice<T>`	Device (VRAM)	Borrowed sub-range of a device buffer
`PinnedBuffer<T>`	Host (pinned)	Page-locked host memory for fast DMA
`UnifiedBuffer<T>`	Unified/managed	Accessible from both host and device
`MappedBuffer<T>`	Host-mapped	Zero-copy host-mapped device-accessible memory
`MemoryPool`	Device pool	Stream-ordered allocation (CUDA 11.2+)

Features

Feature	Description
`pool`	Enable stream-ordered memory pool (CUDA 11.2+)
`gpu-tests`	Enable integration tests that require a real GPU

Platform Support

Platform	Status
Linux	Full support (NVIDIA driver 525+)
Windows	Full support (NVIDIA driver 525+)
macOS	Compile only (UnsupportedPlatform at runtime)

oxicuda-memory 0.1.1