1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
//! # TurboQuant
//!
//! KV cache compression via PolarQuant + QJL — Rust port of
//! [turboquant_plus](https://github.com/TheTom/turboquant_plus).
//!
//! ## Overview
//!
//! TurboQuant is an experimental implementation of KV cache compression for LLM
//! inference, based on the TurboQuant research paper (ICLR 2026). It delivers
//! 3.8–6.4× compression ratios using PolarQuant and Walsh-Hadamard rotation.
//!
//! ## Three compression formats
//!
//! - **turbo2**: 2-bit, ~6.4× compression
//! - **turbo3**: 3-bit, ~4.6–5.1× compression
//! - **turbo4**: 4-bit, ~3.8× compression
//!
//! ## Quick start
//!
//! ```rust
//! use turboquant_plus_rs::{TurboQuant, TurboQuantMSE, KVCacheCompressor};
//! use ndarray::Array1;
//!
//! // Single-vector quantization
//! let tq = TurboQuant::new(128, 3, 42, true);
//! let x = Array1::from_shape_fn(128, |i| (i as f64) / 128.0);
//! let compressed = tq.quantize(&x);
//! let x_hat = tq.dequantize(&compressed);
//!
//! // KV cache compression
//! let compressor = KVCacheCompressor::new(128, 3, 3, 42, true);
//! let stats = compressor.memory_stats(1024, 32, 32);
//! println!("Compression ratio: {:.1}×", stats.compression_ratio);
//! ```
// Re-export primary types at crate root
pub use ;
pub use ;
pub use PolarQuant;
pub use QJL;
pub use ;
pub use ;