Skip to main content

Module memory

Module memory 

Source
Expand description

Memory estimation and tracking utilities for GPU operations.

§Why This Module Exists

GPU memory (VRAM) is a precious and limited resource. Unlike CPU memory, there’s no swap space fallback when VRAM runs out - operations simply fail. This module provides utilities to estimate memory requirements before allocation and track usage during execution, enabling crates to:

  1. Pre-flight checks: Verify sufficient VRAM before starting expensive operations
  2. Batch size optimization: Automatically adjust batch sizes to fit available memory
  3. Memory budgeting: Track allocations across multiple operations
  4. Debugging: Identify memory leaks or unexpected allocations

§Design Decisions

  • Conservative estimation: Estimates include overhead buffers because running out of memory mid-operation is worse than slightly underutilizing VRAM.

  • No global state: MemoryTracker is an explicit struct, not a global singleton, because different parts of an application may need independent tracking.

  • Candle-agnostic sizes: Functions work with shapes and dtypes directly, not just Candle tensors, enabling estimation before tensor creation.

Structs§

MemoryTracker
Memory usage tracker for GPU operations.

Constants§

DEFAULT_OVERHEAD_FACTOR
Default overhead factor applied to memory estimates.

Functions§

estimate_attention_memory
Estimate memory for attention computation.
estimate_tensor_bytes
Estimate the memory required to store a tensor with given shape and dtype.