Skip to main content

Module kv_cache

Module kv_cache 

Source
Expand description

KV-Cache abstraction with handle semantics and block management

This module provides a sentence-handle based abstraction for KV cache management, supporting both contiguous and paged attention patterns with zero-copy operations.

Structs§

AllocationRequest
KV cache allocation request
BlockTable
Block table for mapping logical to physical cache blocks
CacheConfig
Cache configuration
CacheGcStats
Garbage collection statistics
CacheHandleStats
Statistics for individual cache handle
CacheManagerStats
Cache manager statistics
CompressionStats
Cache compression statistics
LruEvictionPolicy
Least Recently Used eviction policy
MemoryPressureThresholds
Memory pressure threshold configuration
PrefixCacheConfig
Prefix caching configuration

Enums§

MemoryPressure
Memory pressure levels for adaptive management

Traits§

AdvancedKvCacheManager
Advanced KV cache capabilities
BlockAllocator
Block-based cache allocator
CacheEvictionPolicy
Cache eviction strategies
KvCacheHandle
KV cache handle providing access to cached key-value states
KvCacheManager
KV cache manager for allocation and lifecycle management
MultiDeviceCacheManager
Multi-device cache manager supporting GPU/CPU hierarchies