pub struct MemoryGuard { /* private fields */ }Expand description
Central memory-safety controller for a single GPU.
Wraps a GpuDevice and provides:
- Optional upfront VRAM reservation (sentinel allocation).
- Budget enforcement — allocations that would exceed the budget are rejected before touching the driver.
- Configurable OOM recovery via
OomPolicy. - An emergency-checkpoint callback.
Construct via MemoryGuardBuilder.
Implementations§
Source§impl MemoryGuard
impl MemoryGuard
Sourcepub fn set_budget(&self, bytes: usize)
pub fn set_budget(&self, bytes: usize)
Set a hard budget in bytes. Allocations that would push used_bytes
past this limit return GpuError::BudgetExceeded without touching
the driver.
Pass 0 to remove the budget (unlimited).
Sourcepub fn on_oom<F: Fn() + Send + Sync + 'static>(&self, f: F)
pub fn on_oom<F: Fn() + Send + Sync + 'static>(&self, f: F)
Register a callback that will be invoked on OOM when the policy is
OomPolicy::CheckpointAndFail. Typically used to save a training
checkpoint so work is not lost.
Sourcepub fn set_oom_policy(&self, policy: OomPolicy)
pub fn set_oom_policy(&self, policy: OomPolicy)
Change the OOM policy at runtime.
Sourcepub fn register_hook(&self, hook: MemoryHook)
pub fn register_hook(&self, hook: MemoryHook)
Register a pre-OOM hook.
Hooks are called (in priority order, lowest first) when an allocation
would exceed the budget. Each hook gets a chance to free memory before
the guard falls through to the OomPolicy.
Sourcepub fn remove_hook(&self, name: &str) -> bool
pub fn remove_hook(&self, name: &str) -> bool
Remove a previously registered hook by name.
Returns true if a hook with that name was found and removed.
Sourcepub fn pressure_level(&self) -> PressureLevel
pub fn pressure_level(&self) -> PressureLevel
Current pressure level based on budget usage.
If no budget is set (budget = 0 / unlimited), always returns
PressureLevel::None.
Sourcepub fn add_pressure_listener(&self, listener: Box<dyn MemoryPressureListener>)
pub fn add_pressure_listener(&self, listener: Box<dyn MemoryPressureListener>)
Register a listener that is notified whenever the pressure level changes (checked after every allocation and free through the guard).
Sourcepub fn safe_alloc_with_hooks<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
pub fn safe_alloc_with_hooks<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
Allocate count zero-initialized elements, trying pre-OOM hooks
before falling through to the OomPolicy.
The algorithm:
- Check if the allocation fits within the budget – if so, allocate directly.
- If not, compute the shortfall.
- Sort hooks by
(priority, estimated_free_bytes descending). - Call hooks one at a time, skipping any whose
execution_overhead_bytesexceeds current headroom, until enough cumulative memory has been freed. - Retry the allocation.
- If still insufficient after all hooks, fall through to the regular
OomPolicypath.
Sourcepub fn release_reservation(&self) -> usize
pub fn release_reservation(&self) -> usize
Release the upfront reservation, making its memory available for
normal allocations. Returns the number of bytes released, or 0 if
there was no active reservation.
Sourcepub fn has_reservation(&self) -> bool
pub fn has_reservation(&self) -> bool
Whether an upfront reservation is currently held.
Sourcepub fn safe_alloc<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
pub fn safe_alloc<T>(&self, count: usize) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr + ValidAsZeroBits,
Allocate count zero-initialized elements on the device, enforcing
the budget and OOM policy.
This is the primary allocation entry point when using the memory
guard. Prefer this over raw CudaAllocator::alloc_zeros.
Sourcepub fn safe_alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr,
pub fn safe_alloc_copy<T>(&self, data: &[T]) -> GpuResult<CudaBuffer<T>>where
T: DeviceRepr,
Allocate by copying host data to the device, enforcing budget and OOM policy.
Sourcepub fn free<T>(&self, buffer: CudaBuffer<T>)
pub fn free<T>(&self, buffer: CudaBuffer<T>)
Return a buffer to the guard, freeing GPU memory and updating statistics.
Sourcepub fn stats(&self) -> MemoryStats
pub fn stats(&self) -> MemoryStats
Snapshot the current memory statistics.
Sourcepub fn reset_peak_stats(&self)
pub fn reset_peak_stats(&self)
Reset the peak-usage counter to the current usage level.
Sourcepub fn device_arc(&self) -> &Arc<GpuDevice>
pub fn device_arc(&self) -> &Arc<GpuDevice>
The underlying device as an Arc.
Trait Implementations§
Auto Trait Implementations§
impl !Freeze for MemoryGuard
impl RefUnwindSafe for MemoryGuard
impl Unpin for MemoryGuard
impl UnsafeUnpin for MemoryGuard
impl UnwindSafe for MemoryGuard
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> DistributionExt for Twhere
T: ?Sized,
impl<T> DistributionExt for Twhere
T: ?Sized,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more