pub struct KVCacheCompressor<B: Backend> {
pub method: CompressionMethod,
pub quant_bits: u8,
/* private fields */
}Expand description
KV cache compressor for low-rank and quantized representations.
Fields§
§method: CompressionMethodCompression method selection.
quant_bits: u8Default quantization bits (INT4/INT8).
Implementations§
Source§impl<B: Backend> KVCacheCompressor<B>
impl<B: Backend> KVCacheCompressor<B>
Sourcepub fn new(method: CompressionMethod, quant_bits: u8) -> Self
pub fn new(method: CompressionMethod, quant_bits: u8) -> Self
Create a new KV cache compressor.
Sourcepub fn method(&self) -> &CompressionMethod
pub fn method(&self) -> &CompressionMethod
Access the compression method.
Sourcepub fn quant_bits(&self) -> u8
pub fn quant_bits(&self) -> u8
Default quantization bits.
Sourcepub fn compress_kv(
&self,
k: Tensor<B, 4>,
v: Tensor<B, 4>,
) -> Result<CompressedKV<B>, &'static str>
pub fn compress_kv( &self, k: Tensor<B, 4>, v: Tensor<B, 4>, ) -> Result<CompressedKV<B>, &'static str>
Sourcepub fn compress_kv_3d(
&self,
k: Tensor<B, 3>,
v: Tensor<B, 3>,
) -> Result<CompressedKV<B>, &'static str>
pub fn compress_kv_3d( &self, k: Tensor<B, 3>, v: Tensor<B, 3>, ) -> Result<CompressedKV<B>, &'static str>
Sourcepub fn decompress_kv(
&self,
compressed: CompressedKV<B>,
) -> Result<(Tensor<B, 4>, Tensor<B, 4>), &'static str>
pub fn decompress_kv( &self, compressed: CompressedKV<B>, ) -> Result<(Tensor<B, 4>, Tensor<B, 4>), &'static str>
Sourcepub fn decompress_kv_3d(
&self,
compressed: CompressedKV<B>,
) -> Result<(Tensor<B, 3>, Tensor<B, 3>), &'static str>
pub fn decompress_kv_3d( &self, compressed: CompressedKV<B>, ) -> Result<(Tensor<B, 3>, Tensor<B, 3>), &'static str>
Sourcepub fn compress_paged_cache(
&self,
cache: &PagedKVCache<B>,
layer: usize,
seq_id: usize,
) -> Result<CompressedKV<B>, &'static str>
pub fn compress_paged_cache( &self, cache: &PagedKVCache<B>, layer: usize, seq_id: usize, ) -> Result<CompressedKV<B>, &'static str>
Compress a sequence from a paged KV cache.
Sourcepub fn decompress_to_paged_cache(
&self,
compressed: CompressedKV<B>,
cache: &mut PagedKVCache<B>,
layer: usize,
seq_id: usize,
) -> Result<(), &'static str>
pub fn decompress_to_paged_cache( &self, compressed: CompressedKV<B>, cache: &mut PagedKVCache<B>, layer: usize, seq_id: usize, ) -> Result<(), &'static str>
Decompress into a paged KV cache by appending tokens.
Sourcepub fn compress_mla_cache(
&self,
cache: &MlaCompressedKVCache<B>,
layer: usize,
seq_id: usize,
) -> Result<CompressedKV<B>, &'static str>
pub fn compress_mla_cache( &self, cache: &MlaCompressedKVCache<B>, layer: usize, seq_id: usize, ) -> Result<CompressedKV<B>, &'static str>
Compress a sequence from an MLA compressed cache.
Sourcepub fn decompress_to_mla_cache(
&self,
compressed: CompressedKV<B>,
cache: &mut MlaCompressedKVCache<B>,
layer: usize,
seq_id: usize,
) -> Result<(), &'static str>
pub fn decompress_to_mla_cache( &self, compressed: CompressedKV<B>, cache: &mut MlaCompressedKVCache<B>, layer: usize, seq_id: usize, ) -> Result<(), &'static str>
Decompress into an MLA compressed cache by appending tokens.
Trait Implementations§
Source§impl<B: Clone + Backend> Clone for KVCacheCompressor<B>
impl<B: Clone + Backend> Clone for KVCacheCompressor<B>
Source§fn clone(&self) -> KVCacheCompressor<B>
fn clone(&self) -> KVCacheCompressor<B>
Returns a duplicate of the value. Read more
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
Performs copy-assignment from
source. Read moreAuto Trait Implementations§
impl<B> Freeze for KVCacheCompressor<B>
impl<B> RefUnwindSafe for KVCacheCompressor<B>where
B: RefUnwindSafe,
impl<B> Send for KVCacheCompressor<B>
impl<B> Sync for KVCacheCompressor<B>
impl<B> Unpin for KVCacheCompressor<B>where
B: Unpin,
impl<B> UnwindSafe for KVCacheCompressor<B>where
B: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more