Expand description
Model executor that uses PagedAttention KV cache.
Unlike MockModelExecutor (which ignores KV cache), this executor:
- Writes K/V vectors to paged blocks during prefill and decode
- Reads K/V through block table indirection for attention
- Produces logits via the paged attention output
Uses identity projections (Q=K=V=input embedding) for deterministic, verifiable behavior without model weights.
Structsยง
- Paged
Attention Executor - A model executor that actually uses paged KV cache for attention.
- Paged
Executor Config - Configuration for the paged attention executor.