Module pool

Expand description

Stream-ordered memory pool for efficient async allocation.

Requires CUDA 11.2+ driver. Gated behind the pool feature.

Stream-ordered memory pools allow allocation and deallocation to be ordered relative to other operations on a CUDA stream, enabling the driver to reuse memory more aggressively and avoid synchronisation barriers that would otherwise be needed for conventional cuMemAlloc / cuMemFree calls.

§Implementation note

This implementation provides a practical fallback pool that reuses freed allocations by size and uses cuMemAlloc_v2 / cuMemFree_v2 under the hood. It keeps the same API surface as a stream-ordered pool, but does not yet expose native CUDA mempool handles.

§API

let pool = MemoryPool::new(device)?;
let buf = PooledBuffer::<f32>::alloc_async(&pool, 1024, &stream)?;
// … use buf in kernels on `stream` …
// buf is freed asynchronously when dropped (enqueued on the pool's stream).

Structs§

MemoryPool: A stream-ordered memory pool (CUDA 11.2+).
NativeMemoryPool: Thin wrapper around the CUDA driver’s stream-ordered memory pool (cuMemPoolCreate / cuMemPoolDestroy).
NativeMemoryPoolProps: Configuration for a NativeMemoryPool.
PoolStats: A stream-ordered memory pool (CUDA 11.2+).
PooledBuffer: A device buffer allocated from a MemoryPool.

Module pool

Module pool Copy item path

§Implementation note

§API

Structs§

Module pool