cortex_rust 0.6.0

High-performance LLM inference with 4-bit quantization and Test-Time Training (TTT)
Documentation
1
2
3
4
5
6
7
8
9
10
//! PagedAttention - Efficient KV Cache Management
//!
//! Based on vLLM's PagedAttention algorithm.
//! Reference: https://arxiv.org/abs/2309.06180

mod block_manager;
mod cache_engine;

pub use block_manager::BlockManager;
pub use cache_engine::{CacheConfig, CacheEngine, PagedKVCache};