Skip to main content

CudaAllocator

Struct CudaAllocator 

Source
pub struct CudaAllocator { /* private fields */ }
Expand description

A caching GPU memory allocator with block pools, splitting, coalescing, and stream-aware reuse.

Wraps a GpuDevice and maintains two block pools (small and large). Allocation requests are served from cached free blocks when possible; only on cache miss does the allocator call through to the CUDA driver. Freed blocks are returned to the pool and coalesced with neighbors to reduce fragmentation.

§CL-323

Implementations§

Source§

impl CudaAllocator

Source

pub fn new(device: Arc<GpuDevice>) -> Self

Create a new caching allocator for the given device.

Source

pub fn alloc_zeros<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>

Allocate count zero-initialized elements of type T on the device.

The returned CudaBuffer is tracked by this allocator. When you are done with it, pass it to free so the statistics stay accurate. (Dropping the buffer directly still frees GPU memory, but the allocated_bytes counter will be too high.)

§Errors

Returns [GpuError::Driver] if the underlying CUDA allocation fails.

Source

pub fn alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>
where T: DeviceRepr,

Copy a host slice to device memory, tracking the allocation.

This is the allocator-aware equivalent of crate::transfer::cpu_to_gpu.

§Errors

Returns [GpuError::Driver] if the CUDA memcpy or allocation fails.

Source

pub fn free<T>(&self, buffer: CudaBuffer<T>)

Return a buffer to the allocator, freeing the GPU memory and updating the statistics.

This is preferred over simply dropping the buffer so that memory_allocated stays accurate.

Source

pub fn memory_allocated(&self) -> usize

Bytes currently allocated (live) on the device through this allocator.

Source

pub fn max_memory_allocated(&self) -> usize

Peak bytes ever allocated since creation or the last reset_peak_stats.

Source

pub fn memory_reserved(&self) -> usize

Total bytes reserved from the CUDA driver (cached + in-use).

Source

pub fn reset_peak_stats(&self)

Reset the peak counter to the current allocation level.

Source

pub fn empty_cache(&self)

Release all cached (free) blocks back to the CUDA driver.

After this call, memory_reserved() drops to memory_allocated() (only blocks currently in use remain). This is useful when another component needs GPU memory and the cache is holding onto freed blocks.

§CL-323
Source

pub fn device(&self) -> &GpuDevice

The underlying device.

Source

pub fn record_stream_on_block(&self, block_idx: usize, stream: StreamId)

Record that a block was used on stream, preventing reuse until work on that stream completes.

This is the Rust equivalent of PyTorch’s recordStream().

§CL-323
Source

pub fn block_count(&self) -> usize

Number of blocks in the arena (for debugging/testing).

Source

pub fn free_block_count(&self) -> usize

Number of free blocks in both pools (for debugging/testing).

Source

pub fn cache_stats(&self) -> (usize, usize)

(hits, misses) cache statistics.

Source

pub fn cached_bytes(&self) -> usize

Total cached (free, reusable) bytes.

Source

pub fn cache_find( &self, size: usize, stream: StreamId, ) -> Option<(usize, usize)>

Try to find a cached block of at least size bytes on stream.

Returns the block index and its actual size if found. The block is marked as allocated and removed from the free pool. If the block is significantly larger than needed, it is split.

§CL-323
Source

pub fn cache_insert( &self, requested_size: usize, driver_alloc_size: usize, ptr: usize, stream: StreamId, ) -> (usize, usize)

Register a new block from a fresh driver allocation.

Called when cache_find returns None and the caller has obtained memory from the CUDA driver. The full driver allocation is registered as a block; if it’s larger than the requested size, the remainder is split off and placed in the free pool.

Returns (block_idx, actual_block_size).

§CL-323
Source

pub fn cache_free(&self, block_idx: usize)

Return a block to the cache (free it back to a pool).

The block is coalesced with any adjacent free blocks and inserted into the appropriate pool for future reuse.

§CL-323
Source

pub fn driver_alloc_size(size: usize) -> usize

Get the driver allocation size for a given request size.

Callers use this to know how many bytes to request from the driver when cache_find misses.

Trait Implementations§

Source§

impl Debug for CudaAllocator

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,