#[non_exhaustive]pub struct CheckpointParams {
pub max_checkpoints: u32,
pub every_n_tokens: i32,
pub min_tokens: u32,
pub min_gap: u32,
}Expand description
Tunable parameters for the in-memory state-checkpoint cache used to preserve KV/recurrent state across chat turns for hybrid models.
Hybrid architectures (Qwen 3.5, Jamba, etc.) interleave Mamba-style
recurrent layers with transformer layers. The recurrent state can’t be
rolled back to an arbitrary earlier position, so a partial KV trim
fails whenever the next prompt diverges deep into the conversation.
To work around this, we periodically snapshot the partial seq state
(recurrent + SWA, via LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY) during
prompt prefill and restore the closest snapshot when the next prompt
arrives. Mirrors the mechanism used by upstream llama-server.
For non-hybrid models (Qwen 2.5, Llama 3, Gemma, …) checkpoints are created but never used because the cheaper partial-trim path succeeds.
Marked #[non_exhaustive]; build via Default::default() and chain the
with_* setters.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.max_checkpoints: u32Maximum number of checkpoints retained per persistent context.
0 disables checkpointing entirely. Each checkpoint is a few MB
for typical hybrid models.
every_n_tokens: i32Approximate spacing between checkpoints during prompt prefill, in
tokens. The last 4..=4 + n_ubatch tokens always get a
checkpoint regardless. <= 0 means “only checkpoint near the end
of the prompt”.
min_tokens: u32Don’t checkpoint the very start of a prompt — saves space for no benefit because we’d have to re-decode that prefix anyway if it’s the entire reuse window.
min_gap: u32Don’t take two checkpoints closer than this many tokens apart.
Implementations§
Source§impl CheckpointParams
impl CheckpointParams
Sourcepub fn with_max_checkpoints(self, max_checkpoints: u32) -> Self
pub fn with_max_checkpoints(self, max_checkpoints: u32) -> Self
Override the maximum number of checkpoints retained per context.
Sourcepub fn with_every_n_tokens(self, every_n_tokens: i32) -> Self
pub fn with_every_n_tokens(self, every_n_tokens: i32) -> Self
Override the approximate spacing between checkpoints (in tokens).
Sourcepub fn with_min_tokens(self, min_tokens: u32) -> Self
pub fn with_min_tokens(self, min_tokens: u32) -> Self
Override the minimum prompt length before checkpoints are taken.
Sourcepub fn with_min_gap(self, min_gap: u32) -> Self
pub fn with_min_gap(self, min_gap: u32) -> Self
Override the minimum spacing between two consecutive checkpoints.
Trait Implementations§
Source§impl Clone for CheckpointParams
impl Clone for CheckpointParams
Source§fn clone(&self) -> CheckpointParams
fn clone(&self) -> CheckpointParams
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more