#[non_exhaustive]pub enum KvCacheLevel {
Fp16,
Q8,
Fp8,
Q4,
}Expand description
KV cache precision tier.
Lower variants are higher precision, larger memory footprint; higher variants are lower precision, smaller memory footprint.
Compactness ordering (ordinal): Fp16 (0) < Q8 (1) < Fp8 (2) < Q4 (3).
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Fp16
FP16 — full quality, baseline memory.
Q8
INT8 quantized — half the memory of FP16.
Fp8
FP8 quantized — half of FP32, same byte width as INT8 but floating-point distribution preserves more dynamic range for attention activations.
Q4
INT4 quantized — quarter the memory of FP16.
Implementations§
Source§impl KvCacheLevel
impl KvCacheLevel
Sourcepub const fn memory_factor(self) -> f32
pub const fn memory_factor(self) -> f32
Memory factor relative to FP16.
| Level | Factor |
|---|---|
| Fp16 | 1.0 |
| Q8 | 0.5 |
| Fp8 | 0.5 |
| Q4 | 0.25 |
Sourcepub const fn ordinal(self) -> u8
pub const fn ordinal(self) -> u8
Compactness order: higher = more compact (more aggressive).
Ordering: Fp16=0 < Q8=1 < Fp8=2 < Q4=3.
Fp8 sits between Q8 and Q4 because both use 1 byte per value but
FP8’s floating-point distribution makes it preferable to INT8 for KV
cache activations while still being intermediate before INT4.
Trait Implementations§
Source§impl Clone for KvCacheLevel
impl Clone for KvCacheLevel
Source§fn clone(&self) -> KvCacheLevel
fn clone(&self) -> KvCacheLevel
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for KvCacheLevel
impl Debug for KvCacheLevel
Source§impl Hash for KvCacheLevel
impl Hash for KvCacheLevel
Source§impl PartialEq for KvCacheLevel
impl PartialEq for KvCacheLevel
Source§fn eq(&self, other: &KvCacheLevel) -> bool
fn eq(&self, other: &KvCacheLevel) -> bool
self and other values to be equal, and is used by ==.impl Copy for KvCacheLevel
impl Eq for KvCacheLevel
impl StructuralPartialEq for KvCacheLevel
Auto Trait Implementations§
impl Freeze for KvCacheLevel
impl RefUnwindSafe for KvCacheLevel
impl Send for KvCacheLevel
impl Sync for KvCacheLevel
impl Unpin for KvCacheLevel
impl UnsafeUnpin for KvCacheLevel
impl UnwindSafe for KvCacheLevel
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more