KV cache for autoregressive transformer generation.
Stores key and value projections for each layer across sequence positions. Designed for single-sequence generation (batch=1).