Expand description
SSM runtime bridge — polymorphic sequence-state pool.
§Overview
OxiLLaMa supports two categories of model architectures:
- Attention-based (LLaMA, Qwen3, Mistral, Gemma, Phi, …): per-sequence state is the KV cache (a contiguous K/V buffer per layer).
- SSM-based (Mamba-2, …): per-sequence state is a set of per-layer recurrent hidden vectors; there is no KV cache.
The SequencePool enum abstracts over both kinds via the
oxillama_arch::common::sequence_state::SequenceState trait. The engine
picks the right pool variant at load time by examining the loaded
architecture; both variants expose the same alloc / release / slot
interface so the rest of the engine stays arch-agnostic.
§Design notes
- Slots are identified by a
usizeindex (same asSequence::slot_id). - A slot is “live” when it holds a
Box<dyn SequenceState>. - On
releasethe state is reset (zeroed) and returned to the free pool. - Neither variant interacts with the KV cache from
kv_cache/mod.rs; the KV-based pool manages its own separate per-slot state. - The SSM state pool owns the
Box<dyn SequenceState>objects outright; the KV-based pool keeps aKvCachePoolfrom which page indices are lent.
§Thread safety
SequencePool is not Send + Sync by itself; it is intended to be
owned by a single-threaded engine or wrapped in a Mutex by the caller.
Structs§
- Sequence
Slot - A live sequence slot in the
SsmStatePool. - SsmState
Pool - A free-list pool of
SequenceSlots for SSM-based models.
Enums§
- Pool
Error - Errors produced by pool operations.
- Sequence
Pool - Dispatch-enum over the two pool backends.
Type Aliases§
- Pool
Result - Convenience alias.