Skip to main content

Module llama_cache_interface

Module llama_cache_interface 

Source
Expand description

Interface to llama.cpp’s attention KV cache for infinite context management

Connects to llama-server’s /slots and /health HTTP endpoints to query and control the running model’s KV cache state. Slot save/restore (file-based) is used for inject/extract because the HTTP API does not expose raw tensor data.

Structs§

LlamaKVCacheInterface
Interface for interacting with llama.cpp’s KV cache via the llama-server HTTP API.
LlamaKVCacheState
Snapshot of llama-server’s KV cache state for a slot