Skip to main content

MemoryGuard

Struct MemoryGuard 

Source
pub struct MemoryGuard { /* private fields */ }
Expand description

Central memory-safety controller for a single GPU.

Wraps a GpuDevice and provides:

  • Optional upfront VRAM reservation (sentinel allocation).
  • Budget enforcement — allocations that would exceed the budget are rejected before touching the driver.
  • Configurable OOM recovery via OomPolicy.
  • An emergency-checkpoint callback.

Construct via MemoryGuardBuilder.

Implementations§

Source§

impl MemoryGuard

Source

pub fn set_budget(&self, bytes: usize)

Set a hard budget in bytes. Allocations that would push used_bytes past this limit return GpuError::BudgetExceeded without touching the driver.

Pass 0 to remove the budget (unlimited).

Source

pub fn budget(&self) -> usize

Current budget (0 = unlimited).

Source

pub fn on_oom<F: Fn() + Send + Sync + 'static>(&self, f: F)

Register a callback that will be invoked on OOM when the policy is OomPolicy::CheckpointAndFail. Typically used to save a training checkpoint so work is not lost.

Source

pub fn set_oom_policy(&self, policy: OomPolicy)

Change the OOM policy at runtime.

Source

pub fn register_hook(&self, hook: MemoryHook)

Register a pre-OOM hook.

Hooks are called (in priority order, lowest first) when an allocation would exceed the budget. Each hook gets a chance to free memory before the guard falls through to the OomPolicy.

Source

pub fn remove_hook(&self, name: &str) -> bool

Remove a previously registered hook by name.

Returns true if a hook with that name was found and removed.

Source

pub fn pressure_level(&self) -> PressureLevel

Current pressure level based on budget usage.

If no budget is set (budget = 0 / unlimited), always returns PressureLevel::None.

Source

pub fn add_pressure_listener(&self, listener: Box<dyn MemoryPressureListener>)

Register a listener that is notified whenever the pressure level changes (checked after every allocation and free through the guard).

Source

pub fn safe_alloc_with_hooks<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>

Allocate count zero-initialized elements, trying pre-OOM hooks before falling through to the OomPolicy.

The algorithm:

  1. Check if the allocation fits within the budget – if so, allocate directly.
  2. If not, compute the shortfall.
  3. Sort hooks by (priority, estimated_free_bytes descending).
  4. Call hooks one at a time, skipping any whose execution_overhead_bytes exceeds current headroom, until enough cumulative memory has been freed.
  5. Retry the allocation.
  6. If still insufficient after all hooks, fall through to the regular OomPolicy path.
Source

pub fn release_reservation(&self) -> usize

Release the upfront reservation, making its memory available for normal allocations. Returns the number of bytes released, or 0 if there was no active reservation.

Source

pub fn has_reservation(&self) -> bool

Whether an upfront reservation is currently held.

Source

pub fn safe_alloc<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>

Allocate count zero-initialized elements on the device, enforcing the budget and OOM policy.

This is the primary allocation entry point when using the memory guard. Prefer this over raw CudaAllocator::alloc_zeros.

Source

pub fn safe_alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>
where T: DeviceRepr,

Allocate by copying host data to the device, enforcing budget and OOM policy.

Source

pub fn free<T>(&self, buffer: CudaBuffer<T>)

Return a buffer to the guard, freeing GPU memory and updating statistics.

Source

pub fn stats(&self) -> MemoryStats

Snapshot the current memory statistics.

Source

pub fn reset_peak_stats(&self)

Reset the peak-usage counter to the current usage level.

Source

pub fn device(&self) -> &GpuDevice

The underlying device.

Source

pub fn device_arc(&self) -> &Arc<GpuDevice>

The underlying device as an Arc.

Trait Implementations§

Source§

impl Debug for MemoryGuard

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Send for MemoryGuard

Source§

impl Sync for MemoryGuard

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> ByRef<T> for T

Source§

fn by_ref(&self) -> &T

Source§

impl<T> DistributionExt for T
where T: ?Sized,

Source§

fn rand<T>(&self, rng: &mut (impl Rng + ?Sized)) -> T
where Self: Distribution<T>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T, U> Imply<T> for U
where T: ?Sized, U: ?Sized,