1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
//! Distributed KV cache and multi-GPU support for ultra-long contexts.
//!
//! This module provides infrastructure for distributing KV cache across
//! multiple GPUs, enabling 2M+ token contexts that exceed single-GPU memory.
//!
//! # Architecture
//!
//! ```text
//! ┌─────────────────────────────────────────────────────────────────────┐
//! │ Distributed PagedKVCache │
//! │ │
//! │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
//! │ │ GPU 0 │ │ GPU 1 │ │ GPU 2 │ ... │
//! │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │
//! │ │ │ Shard 0 │ │ │ │ Shard 1 │ │ │ │ Shard 2 │ │ │
//! │ │ │ 0-700K │ │ │ │700K-1.4M│ │ │ │1.4M-2M │ │ │
//! │ │ │ tokens │ │ │ │ tokens │ │ │ │ tokens │ │ │
//! │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │
//! │ └───────────────┘ └───────────────┘ └───────────────┘ │
//! │ │ │ │ │
//! │ └──────────────────┼──────────────────┘ │
//! │ │ │
//! │ ┌───────▼───────┐ │
//! │ │ Coordinator │ │
//! │ │ - Routing │ │
//! │ │ - Aggregation│ │
//! │ └───────────────┘ │
//! └─────────────────────────────────────────────────────────────────────┘
//! ```
//!
//! # Key Components
//!
//! - [`ShardManager`]: Manages partitioning of KV cache across devices
//! - [`SequenceKV`]: Per-sequence isolated KV storage
//! - [`Coordinator`]: Handles routing and result aggregation
//!
//! # Design Principles
//!
//! 1. **Sequence isolation**: Each sequence has independent storage
//! 2. **No shared mutable state**: Concurrency through isolation
//! 3. **Deterministic aggregation**: Strict ordering for reproducibility
pub use ;
pub use ;
pub use ;