Skip to main content

Crate ferrum_kv

Crate ferrum_kv 

Source
Expand description

§Ferrum KV Cache

MVP KV-Cache management implementation for Ferrum inference stack.

This crate provides block-based KV cache management, implementing the interfaces defined in ferrum-interfaces::kv_cache.

Re-exports§

pub use blocks::*;
pub use cache::*;
pub use managers::*;

Modules§

attention
CPU reference implementation of paged attention.
blocks
cache
managers

Structs§

AllocationRequest
KV cache allocation request
BlockTable
Block table for mapping logical to physical cache blocks
CacheConfig
Cache configuration
CacheHandleStats
Statistics for individual cache handle
CacheManagerStats
Cache manager statistics
CacheStats
Cache statistics
KvManagerConfig
Internal KV Cache manager configuration
LruEvictionPolicy
Least Recently Used eviction policy
PrefixCacheConfig
Prefix caching configuration
RequestId
Request identifier

Enums§

DataType
Data type for tensors
Device
Device type for computation
FerrumError
Main error type for Ferrum operations

Traits§

CacheEvictionPolicy
Cache eviction strategies
KvCacheHandleInterface
KV cache handle providing access to cached key-value states
KvCacheManagerInterface
KV cache manager for allocation and lifecycle management

Functions§

default_manager
Default KV cache manager factory

Type Aliases§

Result
Result type used throughout Ferrum