Expand description
Interface to llama.cpp’s attention KV cache for infinite context management
Connects to llama-server’s /slots and /health HTTP endpoints to query and
control the running model’s KV cache state. Slot save/restore (file-based) is
used for inject/extract because the HTTP API does not expose raw tensor data.
Structs§
- LlamaKV
Cache Interface - Interface for interacting with llama.cpp’s KV cache via the llama-server HTTP API.
- LlamaKV
Cache State - Snapshot of llama-server’s KV cache state for a slot