pub struct PressureMonitor { /* private fields */ }Expand description
Monitors rho = used_tokens / max_tokens and recommends compression actions.
All thresholds come from ContextConfig — no hardcoded constants.
Implementations§
Source§impl PressureMonitor
impl PressureMonitor
pub fn new(max_tokens: u32, config: ContextConfig) -> Self
pub fn max_tokens(&self) -> u32
Sourcepub fn pressure(
&self,
partitions: &ContextPartitions,
engine: &ContextTokenEngine,
observed_prompt_tokens: Option<u32>,
) -> f64
pub fn pressure( &self, partitions: &ContextPartitions, engine: &ContextTokenEngine, observed_prompt_tokens: Option<u32>, ) -> f64
Current pressure rho ∈ [0, +∞). Uses provider-reported prompt tokens when available; otherwise estimates from partitions.
This is the raw rho (full partition weight). Making rho paging-aware — i.e. subtracting
non-resident (Collapsed/SpooledOut/PagedOut) handle tokens so paging immediately relieves
pressure — is not a drop-in here: [crate::context::manager::ContextManager::recompute_handle_residency]
decides the Resident↔Collapsed projection from this very rho, so subtracting collapsed tokens
would drop rho below collapse_threshold and immediately un-collapse (oscillation). That needs
a deliberate split into raw rho (drives the collapse decision) vs effective rho (drives
further compaction/renewal), tracked as remaining W1-1 design work — see ContextManager::rho.