pub struct LlamaContext {
pub backend: Arc<LlamaLib>,
pub handle: *mut llama_context,
}Expand description
Inference context attached to a model
Fields§
§backend: Arc<LlamaLib>§handle: *mut llama_contextImplementations§
Source§impl LlamaContext
impl LlamaContext
pub fn new( model: &LlamaModel, params: llama_context_params, ) -> Result<Self, LlamaError>
pub fn default_params(model: &LlamaModel) -> llama_context_params
pub fn decode(&mut self, batch: &LlamaBatch) -> Result<(), LlamaError>
Sourcepub fn kv_cache_clear(&mut self)
pub fn kv_cache_clear(&mut self)
Clear the KV cache for this context. Resets all cached key/value state, allowing the context to be reused for a fresh generation without reallocating.
Sourcepub fn kv_cache_seq_rm(
&mut self,
seq_id: llama_seq_id,
p0: llama_pos,
p1: llama_pos,
) -> bool
pub fn kv_cache_seq_rm( &mut self, seq_id: llama_seq_id, p0: llama_pos, p1: llama_pos, ) -> bool
Remove KV cache entries for sequence seq_id in position range [p0, p1).
If p0 < 0, removes from the beginning. If p1 < 0, removes to the end.
Returns true if the operation succeeded.
This is used for incremental prompt encoding: when the conversation diverges from the cached prefix, only the divergent suffix needs to be removed and re-decoded, avoiding a full KV cache clear.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for LlamaContext
impl RefUnwindSafe for LlamaContext
impl Unpin for LlamaContext
impl UnsafeUnpin for LlamaContext
impl UnwindSafe for LlamaContext
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more