Skip to main content

Module offload

oxillama_runtime

Module offload

Expand description

CPU/disk offload with a pinned hot-layer set.

This module provides the infrastructure for offloading model weights to disk and loading them on-demand with an LRU eviction policy. A pinned hot-set keeps embeddings, the output head, and the last N attention layers always resident in RAM.

§Overview

OffloadPolicy — declarative configuration: none, budget, or pinned hot-set.
LayerPager — LRU weight pager with eviction, pinned tensors, and on-demand loads from a PagerSource.
MemoryPressureProbe — lightweight OS-level pressure monitor (Linux / macOS).

Re-exports§

pub use pager::FilePagerSource;
pub use pager::LayerPager;
pub use pager::PagerSource;
pub use pager::ResidentTensor;
pub use pager::TensorEntry;
pub use pager::TensorId;
pub use policy::OffloadPolicy;
pub use pressure::MemoryPressureProbe;

Modules§

pager: LRU weight pager — the core of the CPU/disk offload system.
policy: Offload policy configuration.
pressure: Memory pressure probe — lightweight OS-level RAM usage monitor.