Module memory_estimate

Expand description

Pre-load memory estimation (plan #35).

Borrowed from MAX’s max/python/max/pipelines/ pattern: model peak memory is estimated before weights load. On Apple Silicon this matters disproportionately — unified memory is shared with the OS, so a model that “would fit on a 96 GB Mac” can still OOM if you spawn it during a heavy Spotlight re-index.

Three components:

Activation working set — peak arena bytes. Already computed by rlx_opt::memory::plan_memory(&graph); we just expose it.
Weight bytes — sum of registered weights from a WeightRegistry. Aliases (tied embeddings) don’t double-count.
Per-batch input bytes — bytes the user is going to hand in via compiled.run(). Driven by graph inputs.

MemoryEstimate::peak_bytes is the sum. [MemoryEstimate:: fits_in] takes a budget and returns the gating decision plus a structured reason.

Structs§

MemoryDeficit
MemoryEstimate
MoeOffloadEstimate: Estimate peak memory for running graph on a session bound to registry. Pure analysis — runs the memory planner internally and queries the registry for weight bytes; doesn’t compile or execute. MoE offload sizing (TIDE enable_predictive_expert_offload).

Functions§

available_unified_memory: Available unified-memory budget on the running machine. On macOS reads hw.memsize via sysctl; everywhere else returns None so callers can fall back to a user-supplied budget.
estimate
estimate_moe_offload: Compute GPU expert budget from a memory budget (unified RAM or VRAM).