Expand description
Stream-ordered memory pool for efficient async allocation.
Requires CUDA 11.2+ driver. Gated behind the pool feature.
Stream-ordered memory pools allow allocation and deallocation to be
ordered relative to other operations on a CUDA stream, enabling the
driver to reuse memory more aggressively and avoid synchronisation
barriers that would otherwise be needed for conventional
cuMemAlloc / cuMemFree calls.
§Implementation note
This implementation provides a practical fallback pool that reuses freed
allocations by size and uses cuMemAlloc_v2 / cuMemFree_v2 under the
hood. It keeps the same API surface as a stream-ordered pool, but does
not yet expose native CUDA mempool handles.
§API
ⓘ
let pool = MemoryPool::new(device)?;
let buf = PooledBuffer::<f32>::alloc_async(&pool, 1024, &stream)?;
// … use buf in kernels on `stream` …
// buf is freed asynchronously when dropped (enqueued on the pool's stream).Structs§
- Memory
Pool - A stream-ordered memory pool (CUDA 11.2+).
- Native
Memory Pool - Thin wrapper around the CUDA driver’s stream-ordered memory pool
(
cuMemPoolCreate/cuMemPoolDestroy). - Native
Memory Pool Props - Configuration for a
NativeMemoryPool. - Pool
Stats - A stream-ordered memory pool (CUDA 11.2+).
- Pooled
Buffer - A device buffer allocated from a
MemoryPool.