Expand description
KV Cache module for efficient autoregressive generation
KV Cache caches key and value states to avoid recomputing them during autoregressive generation, significantly improving inference speed.
Structsยง
- KVCache
- KV Cache for caching key and value states during generation
- PagedKV
Cache - Paged KV Cache for vLLM-style memory management