Skip to main content

pf_cache/
lib.rs

1// SPDX-License-Identifier: MIT
2//! # `pf-cache`
3//!
4//! Paged KV-cache capture, content-addressing per page (CoW across forks),
5//! and a [`CachePager`] trait that the per-engine adapters implement.
6//!
7//! See `agent_docs/cache-layer.md` for the spec and
8//! `.claude/skills/kvcache-format/SKILL.md` for the page-out / page-in
9//! pseudo-code. The on-disk format is `paged-batchinvariant-v1`.
10//!
11//! ## What ships in Phase 4 (this commit)
12//!
13//! - [`format::PageManifest`]: the wire-format struct mirrored from the spec.
14//! - [`serialize::serialize_pages`] / [`serialize::deserialize_pages`]:
15//!   portable round-trip via the [`pf_core::cas::BlobStore`] trait, no GPU.
16//! - [`pager::CachePager`]: the engine-agnostic interface every adapter
17//!   implements (vLLM, SGLang, …).
18//! - [`pager::SyntheticCachePager`]: in-memory implementation used by every
19//!   test in this crate. Lets us prove serialize+restore round-trip without
20//!   booting an inference engine.
21//! - [`capture::capture_cache`] / [`capture::restore_cache`]: high-level
22//!   one-shot helpers that the snapshotter calls.
23//!
24//! ## Bit-exact replay
25//!
26//! Bit-exact restore requires batch-invariant kernels (vLLM
27//! `--enforce-deterministic`, SGLang `--deterministic-mode`). The CUDA-host
28//! integration test (`tests/cache_bit_exact_vllm.rs`) is gated behind
29//! `$PF_HAS_GPU=1`; the in-process round-trip in `tests/cache_round_trip.rs`
30//! is the build-host proxy and runs everywhere.
31
32#![deny(unsafe_code)]
33#![allow(missing_docs)] // documented per-symbol in submodules
34
35pub mod capture;
36pub mod format;
37pub mod pager;
38pub mod serialize;
39
40pub use capture::{capture_cache, restore_cache};
41pub use format::{CacheMeta, Dtype, LAYOUT_V1, LogicalSeq, Page, PageManifest};
42pub use pager::{CachePager, SyntheticCachePager};
43pub use serialize::{deserialize_pages, serialize_pages};