Skip to main content

Module memory_estimate

Module memory_estimate 

Source
Expand description

Pre-load memory estimation (plan #35).

Borrowed from MAX’s max/python/max/pipelines/ pattern: model peak memory is estimated before weights load. On Apple Silicon this matters disproportionately — unified memory is shared with the OS, so a model that “would fit on a 96 GB Mac” can still OOM if you spawn it during a heavy Spotlight re-index.

Three components:

  • Activation working set — peak arena bytes. Already computed by rlx_opt::memory::plan_memory(&graph); we just expose it.
  • Weight bytes — sum of registered weights from a WeightRegistry. Aliases (tied embeddings) don’t double-count.
  • Per-batch input bytes — bytes the user is going to hand in via compiled.run(). Driven by graph inputs.

MemoryEstimate::peak_bytes is the sum. [MemoryEstimate:: fits_in] takes a budget and returns the gating decision plus a structured reason.

Structs§

MemoryDeficit
MemoryEstimate
MoeOffloadEstimate
Estimate peak memory for running graph on a session bound to registry. Pure analysis — runs the memory planner internally and queries the registry for weight bytes; doesn’t compile or execute. MoE offload sizing (TIDE enable_predictive_expert_offload).

Functions§

available_unified_memory
Available unified-memory budget on the running machine. On macOS reads hw.memsize via sysctl; everywhere else returns None so callers can fall back to a user-supplied budget.
estimate
estimate_moe_offload
Compute GPU expert budget from a memory budget (unified RAM or VRAM).