Skip to main content

KvCacheType

Enum KvCacheType 

Source
#[non_exhaustive]
pub enum KvCacheType {
Show 32 variants F32, F16, BF16, F64, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q8_K, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S, IQ1_S, IQ1_M, IQ4_XS, IQ4_NL, I8, I16, I32, I64, TQ1_0, TQ2_0, MXFP4,
}
Expand description

Data type used for an entry in the attention KV cache.

Mirrors the subset of ggml_type values that llama.cpp accepts as KV cache element types. The F16 default preserves full attention quality; quantizing (e.g. Q8_0 ≈ ½ size, Q4_0 ≈ ¼ size) trades a small amount of accuracy for a large VRAM reduction at long n_ctx.

This is a local shim around llama_cpp_2::context::params::KvCacheType so a future llama-cpp-2 update doesn’t force a breaking release of rig-llama-cpp. Marked #[non_exhaustive]: when llama.cpp adds a new ggml_type, we add a corresponding variant in a minor (0.1.x) release.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

F32

IEEE 754 single precision.

§

F16

IEEE 754 half precision (llama.cpp’s default for both K and V).

§

BF16

Brain floating-point 16, common on newer NVIDIA / AMD GPUs.

§

F64

IEEE 754 double precision.

§

Q4_0

4-bit block quantization, type 0.

§

Q4_1

4-bit block quantization, type 1.

§

Q5_0

5-bit block quantization, type 0.

§

Q5_1

5-bit block quantization, type 1.

§

Q8_0

8-bit block quantization, type 0.

§

Q8_1

8-bit block quantization, type 1.

§

Q2_K

2-bit K-quant.

§

Q3_K

3-bit K-quant.

§

Q4_K

4-bit K-quant.

§

Q5_K

5-bit K-quant.

§

Q6_K

6-bit K-quant.

§

Q8_K

8-bit K-quant.

§

IQ2_XXS

Importance-weighted 2-bit, extra-extra-small.

§

IQ2_XS

Importance-weighted 2-bit, extra-small.

§

IQ2_S

Importance-weighted 2-bit, small.

§

IQ3_XXS

Importance-weighted 3-bit, extra-extra-small.

§

IQ3_S

Importance-weighted 3-bit, small.

§

IQ1_S

Importance-weighted 1-bit, small.

§

IQ1_M

Importance-weighted 1-bit, medium.

§

IQ4_XS

Importance-weighted 4-bit, extra-small.

§

IQ4_NL

Importance-weighted 4-bit, non-linear.

§

I8

Signed 8-bit integer.

§

I16

Signed 16-bit integer.

§

I32

Signed 32-bit integer.

§

I64

Signed 64-bit integer.

§

TQ1_0

Ternary 1-bit, type 0.

§

TQ2_0

Ternary 2-bit, type 0.

§

MXFP4

Microscaling FP4.

Trait Implementations§

Source§

impl Clone for KvCacheType

Source§

fn clone(&self) -> KvCacheType

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for KvCacheType

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<KvCacheType> for KvCacheType

Source§

fn from(value: KvCacheType) -> Self

Converts to this type from the input type.
Source§

impl Hash for KvCacheType

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for KvCacheType

Source§

fn eq(&self, other: &KvCacheType) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for KvCacheType

Source§

impl Eq for KvCacheType

Source§

impl StructuralPartialEq for KvCacheType

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DynClone for T
where T: Clone,

Source§

fn __clone_box(&self, _: Private) -> *mut ()

Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> WasmCompatSend for T
where T: Send,

Source§

impl<T> WasmCompatSync for T
where T: Sync,