# oxicuda-memory
Type-safe GPU memory management with Rust ownership semantics.
Part of the [OxiCUDA](https://github.com/cool-japan/oxicuda) project.
## Overview
`oxicuda-memory` provides safe, RAII-based wrappers around CUDA memory
allocation and transfer operations. Every buffer type owns its GPU (or
pinned-host) allocation and automatically frees it on `Drop`, preventing
leaks without requiring manual cleanup.
The crate enforces compile-time type safety through generics (`T: Copy`)
and validates sizes at runtime, returning `CudaError::InvalidValue` for
mismatches rather than panicking. `Drop` implementations log errors via
`tracing::warn` instead of panicking, ensuring safe teardown even when
the CUDA context has already been destroyed.
The `copy` module provides freestanding transfer functions that mirror the
CUDA driver `cuMemcpy*` family (`copy_htod`, `copy_dtoh`, `copy_dtod`)
with both synchronous and async variants. For convenience, `DeviceBuffer`
also exposes methods like `copy_from_host()` and `copy_to_host()` directly.
## Modules
| `device_buffer` | `DeviceBuffer<T>` (VRAM) and `DeviceSlice<T>` (sub-range) |
| `host_buffer` | `PinnedBuffer<T>` -- page-locked host memory for fast DMA |
| `unified` | `UnifiedBuffer<T>` -- CUDA managed memory (host+device) |
| `zero_copy` | `MappedBuffer<T>` -- zero-copy host-mapped device-accessible mem |
| `copy` | Freestanding `copy_htod`, `copy_dtoh`, `copy_dtod` (sync + async) |
| `copy_2d3d` | `Memcpy2DParams`, `Memcpy3DParams`, 2D/3D memory copy helpers |
| `pool` | `MemoryPool` -- stream-ordered allocation (behind `pool` feature) |
| `pool_stats` | `AllocationHistogram`, `PoolReport`, fragmentation metrics |
| `aligned` | Aligned allocation options (256-byte / 512-byte alignment) |
| `buffer_view` | Type-safe buffer view and reinterpret-cast helpers |
| `host_registered` | Host-registered memory (`cuMemHostRegister`) |
| `managed_hints` | CUDA 12+ managed memory hints, migration policy, prefetch plans |
| `virtual_memory` | `VirtualAddressRange`, `PhysicalAllocation`, virtual memory mgmt |
| `peer_copy` | Multi-GPU peer memory copy (`copy_peer`, `copy_peer_async`) |
| `memory_info` | Device memory info queries (`cuMemGetInfo`) |
| `bandwidth_profiler` | Transfer timing and throughput measurement hooks |
## Quick Start
```rust,no_run
use oxicuda_driver::prelude::*;
use oxicuda_memory::prelude::*;
init()?;
let dev = Device::get(0)?;
let _ctx = Context::new(&dev)?;
// Allocate a device buffer and upload host data.
let host_data = vec![1.0f32; 1024];
let mut gpu_buf = DeviceBuffer::<f32>::from_slice(&host_data)?;
// Download results back to the host.
let mut result = vec![0.0f32; 1024];
gpu_buf.copy_to_host(&mut result)?;
# Ok::<(), oxicuda_driver::CudaError>(())
```
## Buffer Types
| `DeviceBuffer<T>` | Device (VRAM) | Primary GPU-side buffer |
| `DeviceSlice<T>` | Device (VRAM) | Borrowed sub-range of a device buffer |
| `PinnedBuffer<T>` | Host (pinned) | Page-locked host memory for fast DMA |
| `UnifiedBuffer<T>` | Unified/managed | Accessible from both host and device |
| `MappedBuffer<T>` | Host-mapped | Zero-copy host-mapped device-accessible memory |
| `MemoryPool` | Device pool | Stream-ordered allocation (CUDA 11.2+) |
## Features
| `pool` | Enable stream-ordered memory pool (CUDA 11.2+) |
| `gpu-tests` | Enable integration tests that require a real GPU |
## Platform Support
| Linux | Full support (NVIDIA driver 525+) |
| Windows | Full support (NVIDIA driver 525+) |
| macOS | Compile only (UnsupportedPlatform at runtime) |
## Status
| Version | 0.1.4 (2026-04-18) |
| Tests | 201 passing |
| Warnings | 0 |
| `unwrap()` | 0 |
## License
Apache-2.0 -- (C) 2026 COOLJAPAN OU (Team KitaSan)