Skip to main content

Module paged_executor

Module paged_executor

Expand description

Model executor that uses PagedAttention KV cache.

Unlike MockModelExecutor (which ignores KV cache), this executor:

Writes K/V vectors to paged blocks during prefill and decode
Reads K/V through block table indirection for attention
Produces logits via the paged attention output

Uses identity projections (Q=K=V=input embedding) for deterministic, verifiable behavior without model weights.

Structs§

PagedAttentionExecutor: A model executor that actually uses paged KV cache for attention.
PagedExecutorConfig: Configuration for the paged attention executor.