Module memory

Expand description

Memory estimation and tracking utilities for GPU operations.

§Why This Module Exists

GPU memory (VRAM) is a precious and limited resource. Unlike CPU memory, there’s no swap space fallback when VRAM runs out - operations simply fail. This module provides utilities to estimate memory requirements before allocation and track usage during execution, enabling crates to:

Pre-flight checks: Verify sufficient VRAM before starting expensive operations
Batch size optimization: Automatically adjust batch sizes to fit available memory
Memory budgeting: Track allocations across multiple operations
Debugging: Identify memory leaks or unexpected allocations

§Design Decisions

Conservative estimation: Estimates include overhead buffers because running out of memory mid-operation is worse than slightly underutilizing VRAM.
No global state: MemoryTracker is an explicit struct, not a global singleton, because different parts of an application may need independent tracking.
Candle-agnostic sizes: Functions work with shapes and dtypes directly, not just Candle tensors, enabling estimation before tensor creation.

Structs§

MemoryTracker: Memory usage tracker for GPU operations.

Constants§

DEFAULT_OVERHEAD_FACTOR: Default overhead factor applied to memory estimates.

Functions§

estimate_attention_memory: Estimate memory for attention computation.
estimate_tensor_bytes: Estimate the memory required to store a tensor with given shape and dtype.

Module memory

Module memory Copy item path

§Why This Module Exists

§Design Decisions

Structs§

Constants§

Functions§

Module memory