#[non_exhaustive]pub enum KvCacheType {
Show 32 variants
F32,
F16,
BF16,
F64,
Q4_0,
Q4_1,
Q5_0,
Q5_1,
Q8_0,
Q8_1,
Q2_K,
Q3_K,
Q4_K,
Q5_K,
Q6_K,
Q8_K,
IQ2_XXS,
IQ2_XS,
IQ2_S,
IQ3_XXS,
IQ3_S,
IQ1_S,
IQ1_M,
IQ4_XS,
IQ4_NL,
I8,
I16,
I32,
I64,
TQ1_0,
TQ2_0,
MXFP4,
}Expand description
Data type used for an entry in the attention KV cache.
Mirrors the subset of ggml_type values that llama.cpp accepts as KV
cache element types. The F16 default preserves full attention quality;
quantizing (e.g. Q8_0 ≈ ½ size, Q4_0 ≈ ¼ size) trades a small amount
of accuracy for a large VRAM reduction at long n_ctx.
This is a local shim around llama_cpp_2::context::params::KvCacheType
so a future llama-cpp-2 update doesn’t force a breaking release of
rig-llama-cpp. Marked #[non_exhaustive]: when llama.cpp adds a new
ggml_type, we add a corresponding variant in a minor (0.1.x) release.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
F32
IEEE 754 single precision.
F16
IEEE 754 half precision (llama.cpp’s default for both K and V).
BF16
Brain floating-point 16, common on newer NVIDIA / AMD GPUs.
F64
IEEE 754 double precision.
Q4_0
4-bit block quantization, type 0.
Q4_1
4-bit block quantization, type 1.
Q5_0
5-bit block quantization, type 0.
Q5_1
5-bit block quantization, type 1.
Q8_0
8-bit block quantization, type 0.
Q8_1
8-bit block quantization, type 1.
Q2_K
2-bit K-quant.
Q3_K
3-bit K-quant.
Q4_K
4-bit K-quant.
Q5_K
5-bit K-quant.
Q6_K
6-bit K-quant.
Q8_K
8-bit K-quant.
IQ2_XXS
Importance-weighted 2-bit, extra-extra-small.
IQ2_XS
Importance-weighted 2-bit, extra-small.
IQ2_S
Importance-weighted 2-bit, small.
IQ3_XXS
Importance-weighted 3-bit, extra-extra-small.
IQ3_S
Importance-weighted 3-bit, small.
IQ1_S
Importance-weighted 1-bit, small.
IQ1_M
Importance-weighted 1-bit, medium.
IQ4_XS
Importance-weighted 4-bit, extra-small.
IQ4_NL
Importance-weighted 4-bit, non-linear.
I8
Signed 8-bit integer.
I16
Signed 16-bit integer.
I32
Signed 32-bit integer.
I64
Signed 64-bit integer.
TQ1_0
Ternary 1-bit, type 0.
TQ2_0
Ternary 2-bit, type 0.
MXFP4
Microscaling FP4.
Trait Implementations§
Source§impl Clone for KvCacheType
impl Clone for KvCacheType
Source§fn clone(&self) -> KvCacheType
fn clone(&self) -> KvCacheType
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for KvCacheType
impl Debug for KvCacheType
Source§impl From<KvCacheType> for KvCacheType
impl From<KvCacheType> for KvCacheType
Source§fn from(value: KvCacheType) -> Self
fn from(value: KvCacheType) -> Self
Source§impl Hash for KvCacheType
impl Hash for KvCacheType
Source§impl PartialEq for KvCacheType
impl PartialEq for KvCacheType
Source§fn eq(&self, other: &KvCacheType) -> bool
fn eq(&self, other: &KvCacheType) -> bool
self and other values to be equal, and is used by ==.impl Copy for KvCacheType
impl Eq for KvCacheType
impl StructuralPartialEq for KvCacheType
Auto Trait Implementations§
impl Freeze for KvCacheType
impl RefUnwindSafe for KvCacheType
impl Send for KvCacheType
impl Sync for KvCacheType
impl Unpin for KvCacheType
impl UnsafeUnpin for KvCacheType
impl UnwindSafe for KvCacheType
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.