Expand description
KV-Cache alignment for commercial LLM prompt caching.
Claude’s prompt caching stores KV-tensors for byte-exact prefix matches. GPT models have similar mechanisms. This module ensures lean-ctx outputs are structured to maximize cache hit rates.
Key strategies:
- Stable prefix: invariant content (instructions, tool defs) comes first
- Cache-block alignment: content segmented to match provider breakpoints
- Delta-only after cached prefix: only send changes, rest stays in KV-cache
- Deterministic ordering: same inputs always produce byte-identical output
Structs§
Functions§
- cache_
order_ code - Order file contents for maximum cache reuse across tool calls. Stable elements (imports, type defs) first, then variable elements (function bodies).
- compute_
delta - Generate a delta between two versions of content for cache-efficient updates. Returns only the changed portions, prefixed with stable context identifiers.