Module llama_cache_interface

Expand description

Interface to llama.cpp’s attention KV cache for infinite context management

Connects to llama-server’s /slots and /health HTTP endpoints to query and control the running model’s KV cache state. Slot save/restore (file-based) is used for inject/extract because the HTTP API does not expose raw tensor data.