Expand description
CPU/disk offload with a pinned hot-layer set.
This module provides the infrastructure for offloading model weights to disk and loading them on-demand with an LRU eviction policy. A pinned hot-set keeps embeddings, the output head, and the last N attention layers always resident in RAM.
§Overview
OffloadPolicy— declarative configuration: none, budget, or pinned hot-set.LayerPager— LRU weight pager with eviction, pinned tensors, and on-demand loads from aPagerSource.MemoryPressureProbe— lightweight OS-level pressure monitor (Linux / macOS).
Re-exports§
pub use pager::FilePagerSource;pub use pager::LayerPager;pub use pager::PagerSource;pub use pager::ResidentTensor;pub use pager::TensorEntry;pub use pager::TensorId;pub use policy::OffloadPolicy;pub use pressure::MemoryPressureProbe;