Skip to main content

Module paged_executor

Module paged_executor 

Source
Expand description

Model executor that uses PagedAttention KV cache.

Unlike MockModelExecutor (which ignores KV cache), this executor:

  • Writes K/V vectors to paged blocks during prefill and decode
  • Reads K/V through block table indirection for attention
  • Produces logits via the paged attention output

Uses identity projections (Q=K=V=input embedding) for deterministic, verifiable behavior without model weights.

Structsยง

PagedAttentionExecutor
A model executor that actually uses paged KV cache for attention.
PagedExecutorConfig
Configuration for the paged attention executor.