pub struct CudaAllocator { /* private fields */ }Expand description
A caching GPU memory allocator with block pools, splitting, coalescing, and stream-aware reuse.
Wraps a GpuDevice and maintains two block pools (small and large).
Allocation requests are served from cached free blocks when possible;
only on cache miss does the allocator call through to the CUDA driver.
Freed blocks are returned to the pool and coalesced with neighbors to
reduce fragmentation.
§CL-323
Implementations§
Source§impl CudaAllocator
impl CudaAllocator
Sourcepub fn new(device: Arc<GpuDevice>) -> Self
pub fn new(device: Arc<GpuDevice>) -> Self
Create a new caching allocator for the given device.
Sourcepub fn alloc_zeros<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
pub fn alloc_zeros<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
Allocate count zero-initialized elements of type T on the device.
The returned CudaBuffer is tracked by this allocator. When you are
done with it, pass it to free so the
statistics stay accurate. (Dropping the buffer directly still frees
GPU memory, but the allocated_bytes counter will be too high.)
§Errors
Returns [GpuError::Driver] if the underlying CUDA allocation fails.
Sourcepub fn alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr,
pub fn alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr,
Copy a host slice to device memory, tracking the allocation.
This is the allocator-aware equivalent of crate::transfer::cpu_to_gpu.
§Errors
Returns [GpuError::Driver] if the CUDA memcpy or allocation fails.
Sourcepub fn free<T>(&self, buffer: CudaBuffer<T>)
pub fn free<T>(&self, buffer: CudaBuffer<T>)
Return a buffer to the allocator, freeing the GPU memory and updating the statistics.
This is preferred over simply dropping the buffer so that
memory_allocated stays accurate.
Sourcepub fn memory_allocated(&self) -> usize
pub fn memory_allocated(&self) -> usize
Bytes currently allocated (live) on the device through this allocator.
Sourcepub fn max_memory_allocated(&self) -> usize
pub fn max_memory_allocated(&self) -> usize
Peak bytes ever allocated since creation or the last
reset_peak_stats.
Sourcepub fn memory_reserved(&self) -> usize
pub fn memory_reserved(&self) -> usize
Total bytes reserved from the CUDA driver (cached + in-use).
Sourcepub fn reset_peak_stats(&self)
pub fn reset_peak_stats(&self)
Reset the peak counter to the current allocation level.
Sourcepub fn empty_cache(&self)
pub fn empty_cache(&self)
Release all cached (free) blocks back to the CUDA driver.
After this call, memory_reserved() drops to memory_allocated()
(only blocks currently in use remain). This is useful when another
component needs GPU memory and the cache is holding onto freed blocks.
§CL-323
Sourcepub fn record_stream_on_block(&self, block_idx: usize, stream: StreamId)
pub fn record_stream_on_block(&self, block_idx: usize, stream: StreamId)
Record that a block was used on stream, preventing reuse until
work on that stream completes.
This is the Rust equivalent of PyTorch’s recordStream().
§CL-323
Sourcepub fn block_count(&self) -> usize
pub fn block_count(&self) -> usize
Number of blocks in the arena (for debugging/testing).
Sourcepub fn free_block_count(&self) -> usize
pub fn free_block_count(&self) -> usize
Number of free blocks in both pools (for debugging/testing).
Sourcepub fn cache_stats(&self) -> (usize, usize)
pub fn cache_stats(&self) -> (usize, usize)
(hits, misses) cache statistics.
Sourcepub fn cached_bytes(&self) -> usize
pub fn cached_bytes(&self) -> usize
Total cached (free, reusable) bytes.
Sourcepub fn cache_find(
&self,
size: usize,
stream: StreamId,
) -> Option<(usize, usize)>
pub fn cache_find( &self, size: usize, stream: StreamId, ) -> Option<(usize, usize)>
Try to find a cached block of at least size bytes on stream.
Returns the block index and its actual size if found. The block is marked as allocated and removed from the free pool. If the block is significantly larger than needed, it is split.
§CL-323
Sourcepub fn cache_insert(
&self,
requested_size: usize,
driver_alloc_size: usize,
ptr: usize,
stream: StreamId,
) -> (usize, usize)
pub fn cache_insert( &self, requested_size: usize, driver_alloc_size: usize, ptr: usize, stream: StreamId, ) -> (usize, usize)
Register a new block from a fresh driver allocation.
Called when cache_find returns None and the caller has obtained
memory from the CUDA driver. The full driver allocation is registered
as a block; if it’s larger than the requested size, the remainder is
split off and placed in the free pool.
Returns (block_idx, actual_block_size).
§CL-323
Sourcepub fn cache_free(&self, block_idx: usize)
pub fn cache_free(&self, block_idx: usize)
Return a block to the cache (free it back to a pool).
The block is coalesced with any adjacent free blocks and inserted into the appropriate pool for future reuse.
§CL-323
Sourcepub fn driver_alloc_size(size: usize) -> usize
pub fn driver_alloc_size(size: usize) -> usize
Get the driver allocation size for a given request size.
Callers use this to know how many bytes to request from the driver
when cache_find misses.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for CudaAllocator
impl RefUnwindSafe for CudaAllocator
impl Send for CudaAllocator
impl Sync for CudaAllocator
impl Unpin for CudaAllocator
impl UnsafeUnpin for CudaAllocator
impl UnwindSafe for CudaAllocator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more