Expand description
Memory estimation and tracking utilities for GPU operations.
§Why This Module Exists
GPU memory (VRAM) is a precious and limited resource. Unlike CPU memory, there’s no swap space fallback when VRAM runs out - operations simply fail. This module provides utilities to estimate memory requirements before allocation and track usage during execution, enabling crates to:
- Pre-flight checks: Verify sufficient VRAM before starting expensive operations
- Batch size optimization: Automatically adjust batch sizes to fit available memory
- Memory budgeting: Track allocations across multiple operations
- Debugging: Identify memory leaks or unexpected allocations
§Design Decisions
-
Conservative estimation: Estimates include overhead buffers because running out of memory mid-operation is worse than slightly underutilizing VRAM.
-
No global state:
MemoryTrackeris an explicit struct, not a global singleton, because different parts of an application may need independent tracking. -
Candle-agnostic sizes: Functions work with shapes and dtypes directly, not just Candle tensors, enabling estimation before tensor creation.
Structs§
- Memory
Tracker - Memory usage tracker for GPU operations.
Constants§
- DEFAULT_
OVERHEAD_ FACTOR - Default overhead factor applied to memory estimates.
Functions§
- estimate_
attention_ memory - Estimate memory for attention computation.
- estimate_
tensor_ bytes - Estimate the memory required to store a tensor with given shape and dtype.