Skip to main content

KvCacheType

Enum KvCacheType

#[non_exhaustive]pub enum KvCacheType {
Show 32 variants    F32,
    F16,
    BF16,
    F64,
    Q4_0,
    Q4_1,
    Q5_0,
    Q5_1,
    Q8_0,
    Q8_1,
    Q2_K,
    Q3_K,
    Q4_K,
    Q5_K,
    Q6_K,
    Q8_K,
    IQ2_XXS,
    IQ2_XS,
    IQ2_S,
    IQ3_XXS,
    IQ3_S,
    IQ1_S,
    IQ1_M,
    IQ4_XS,
    IQ4_NL,
    I8,
    I16,
    I32,
    I64,
    TQ1_0,
    TQ2_0,
    MXFP4,
}

Expand description

Data type used for an entry in the attention KV cache.

Mirrors the subset of ggml_type values that llama.cpp accepts as KV cache element types. The F16 default preserves full attention quality; quantizing (e.g. Q8_0 ≈ ½ size, Q4_0 ≈ ¼ size) trades a small amount of accuracy for a large VRAM reduction at long n_ctx.

This is a local shim around llama_cpp_2::context::params::KvCacheType so a future llama-cpp-2 update doesn’t force a breaking release of rig-llama-cpp. Marked #[non_exhaustive]: when llama.cpp adds a new ggml_type, we add a corresponding variant in a minor (0.1.x) release.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

F32

IEEE 754 single precision.

F16

IEEE 754 half precision (llama.cpp’s default for both K and V).

BF16

Brain floating-point 16, common on newer NVIDIA / AMD GPUs.

F64

IEEE 754 double precision.

Q4_0

4-bit block quantization, type 0.

Q4_1

4-bit block quantization, type 1.

Q5_0

5-bit block quantization, type 0.

Q5_1

5-bit block quantization, type 1.

Q8_0

8-bit block quantization, type 0.

Q8_1

8-bit block quantization, type 1.

Q2_K

2-bit K-quant.

Q3_K

3-bit K-quant.

Q4_K

4-bit K-quant.

Q5_K

5-bit K-quant.

Q6_K

6-bit K-quant.

Q8_K

8-bit K-quant.

IQ2_XXS

Importance-weighted 2-bit, extra-extra-small.

IQ2_XS

Importance-weighted 2-bit, extra-small.

IQ2_S

Importance-weighted 2-bit, small.

IQ3_XXS

Importance-weighted 3-bit, extra-extra-small.

IQ3_S

Importance-weighted 3-bit, small.

IQ1_S

Importance-weighted 1-bit, small.

IQ1_M

Importance-weighted 1-bit, medium.

IQ4_XS

Importance-weighted 4-bit, extra-small.

IQ4_NL

Importance-weighted 4-bit, non-linear.

I8

Signed 8-bit integer.

I16

Signed 16-bit integer.

I32

Signed 32-bit integer.

I64

Signed 64-bit integer.

TQ1_0

Ternary 1-bit, type 0.

TQ2_0

Ternary 2-bit, type 0.

MXFP4

Microscaling FP4.

Trait Implementations§

impl Clone for KvCacheType

fn clone(&self) -> KvCacheType

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

impl Debug for KvCacheType

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

impl From<KvCacheType> for KvCacheType

fn from(value: KvCacheType) -> Self

Converts to this type from the input type.

impl Hash for KvCacheType

fn hash<H: Hasher>(&self, state: &mut H)

Feeds this value into the given Hasher. Read more

1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more

impl PartialEq for KvCacheType

fn eq(&self, other: &KvCacheType) -> bool

Tests for self and other values to be equal, and is used by ==.

1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

impl Copy for KvCacheType

impl Eq for KvCacheType

impl StructuralPartialEq for KvCacheType

Auto Trait Implementations§

impl Freeze for KvCacheType

impl RefUnwindSafe for KvCacheType

impl Send for KvCacheType

impl Sync for KvCacheType

impl Unpin for KvCacheType

impl UnsafeUnpin for KvCacheType

impl UnwindSafe for KvCacheType

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> CloneToUninit for T
where T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

impl<T> DynClone for T
where T: Clone,

fn __clone_box(&self, _: Private) -> *mut ()

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more

impl<T> ToOwned for T
where T: Clone,

type Owned = T

The resulting type after obtaining ownership.

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn vzip(self) -> V

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more

impl<T> WithSubscriber for T

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more

impl<T> WasmCompatSend for T
where T: Send,

impl<T> WasmCompatSync for T
where T: Sync,