#[non_exhaustive]pub struct CheckpointParams {
pub max_checkpoints: u32,
pub every_n_tokens: i32,
pub min_tokens: u32,
pub min_gap: u32,
}Expand description
Tunable parameters for the in-memory state-checkpoint cache used to preserve KV/recurrent state across chat turns for hybrid models.
Hybrid architectures (Qwen 3.5, Jamba, etc.) interleave Mamba-style
recurrent layers with transformer layers. The recurrent state can’t be
rolled back to an arbitrary earlier position, so a partial KV trim
fails whenever the next prompt diverges deep into the conversation.
To work around this, we periodically snapshot the partial seq state
(recurrent + SWA, via LLAMA_STATE_SEQ_FLAGS_PARTIAL_ONLY) during
prompt prefill and restore the closest snapshot when the next prompt
arrives. Mirrors the mechanism used by upstream llama-server.
For non-hybrid models (Qwen 2.5, Llama 3, Gemma, …) checkpoints are created but never used because the cheaper partial-trim path succeeds.
Marked #[non_exhaustive]; build via Default::default() and chain the
with_* setters.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.max_checkpoints: u32Maximum number of checkpoints retained per persistent context.
0 disables checkpointing entirely. Each checkpoint is a few MB
for typical hybrid models.
every_n_tokens: i32Approximate spacing between checkpoints during prompt prefill, in
tokens. The last 4..=4 + n_ubatch tokens always get a
checkpoint regardless. <= 0 means “only checkpoint near the end
of the prompt”.
min_tokens: u32Don’t checkpoint the very start of a prompt — saves space for no benefit because we’d have to re-decode that prefix anyway if it’s the entire reuse window.
min_gap: u32Don’t take two checkpoints closer than this many tokens apart.
Implementations§
Source§impl CheckpointParams
impl CheckpointParams
Sourcepub fn with_max_checkpoints(self, max_checkpoints: u32) -> Self
pub fn with_max_checkpoints(self, max_checkpoints: u32) -> Self
Override the maximum number of checkpoints retained per context.
Sourcepub fn with_every_n_tokens(self, every_n_tokens: i32) -> Self
pub fn with_every_n_tokens(self, every_n_tokens: i32) -> Self
Override the approximate spacing between checkpoints (in tokens).
Sourcepub fn with_min_tokens(self, min_tokens: u32) -> Self
pub fn with_min_tokens(self, min_tokens: u32) -> Self
Override the minimum prompt length before checkpoints are taken.
Sourcepub fn with_min_gap(self, min_gap: u32) -> Self
pub fn with_min_gap(self, min_gap: u32) -> Self
Override the minimum spacing between two consecutive checkpoints.
Trait Implementations§
Source§impl Clone for CheckpointParams
impl Clone for CheckpointParams
Source§fn clone(&self) -> CheckpointParams
fn clone(&self) -> CheckpointParams
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for CheckpointParams
Source§impl Debug for CheckpointParams
impl Debug for CheckpointParams
Auto Trait Implementations§
impl Freeze for CheckpointParams
impl RefUnwindSafe for CheckpointParams
impl Send for CheckpointParams
impl Sync for CheckpointParams
impl Unpin for CheckpointParams
impl UnsafeUnpin for CheckpointParams
impl UnwindSafe for CheckpointParams
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more