Skip to main content

Module offload

Module offload 

Source
Expand description

CPU/disk offload with a pinned hot-layer set.

This module provides the infrastructure for offloading model weights to disk and loading them on-demand with an LRU eviction policy. A pinned hot-set keeps embeddings, the output head, and the last N attention layers always resident in RAM.

§Overview

Re-exports§

pub use pager::FilePagerSource;
pub use pager::LayerPager;
pub use pager::PagerSource;
pub use pager::ResidentTensor;
pub use pager::TensorEntry;
pub use pager::TensorId;
pub use policy::OffloadPolicy;
pub use pressure::MemoryPressureProbe;

Modules§

pager
LRU weight pager — the core of the CPU/disk offload system.
policy
Offload policy configuration.
pressure
Memory pressure probe — lightweight OS-level RAM usage monitor.