Expand description
KV-Cache abstraction with handle semantics and block management
This module provides a sentence-handle based abstraction for KV cache management, supporting both contiguous and paged attention patterns with zero-copy operations.
Structs§
- Allocation
Request - KV cache allocation request
- Block
Table - Block table for mapping logical to physical cache blocks
- Cache
Config - Cache configuration
- Cache
GcStats - Garbage collection statistics
- Cache
Handle Stats - Statistics for individual cache handle
- Cache
Manager Stats - Cache manager statistics
- Compression
Stats - Cache compression statistics
- LruEviction
Policy - Least Recently Used eviction policy
- Memory
Pressure Thresholds - Memory pressure threshold configuration
- Prefix
Cache Config - Prefix caching configuration
Enums§
- Memory
Pressure - Memory pressure levels for adaptive management
Traits§
- Advanced
KvCache Manager - Advanced KV cache capabilities
- Block
Allocator - Block-based cache allocator
- Cache
Eviction Policy - Cache eviction strategies
- KvCache
Handle - KV cache handle providing access to cached key-value states
- KvCache
Manager - KV cache manager for allocation and lifecycle management
- Multi
Device Cache Manager - Multi-device cache manager supporting GPU/CPU hierarchies