Skip to main content

LlamaFtype

Enum LlamaFtype 

Source
#[non_exhaustive]
pub enum LlamaFtype {
Show 34 variants AllF32 = 0, MostlyF16 = 1, MostlyQ4_0 = 2, MostlyQ4_1 = 3, MostlyQ8_0 = 7, MostlyQ5_0 = 8, MostlyQ5_1 = 9, MostlyQ2K = 10, MostlyQ3KS = 11, MostlyQ3KM = 12, MostlyQ3KL = 13, MostlyQ4KS = 14, MostlyQ4KM = 15, MostlyQ5KS = 16, MostlyQ5KM = 17, MostlyQ6K = 18, MostlyIQ2XXS = 19, MostlyIQ2XS = 20, MostlyQ2KS = 21, MostlyIQ3XS = 22, MostlyIQ3XXS = 23, MostlyIQ1S = 24, MostlyIQ4NL = 25, MostlyIQ3S = 26, MostlyIQ3M = 27, MostlyIQ2S = 28, MostlyIQ2M = 29, MostlyIQ4XS = 30, MostlyIQ1M = 31, MostlyBF16 = 32, MostlyTQ1_0 = 36, MostlyTQ2_0 = 37, MostlyMXFP4Moe = 38, MostlyNVFP4 = 39,
}
Expand description

The quantization type used for the bulk of a model file (maps to llama_ftype).

Pass one of these variants to QuantizeParams::new to choose the target precision.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

AllF32 = 0

All tensors stored as full F32 (very large, for reference only)

§

MostlyF16 = 1

F16 – 14 GB @ 7B, +0.0020 ppl vs Mistral-7B

§

MostlyQ4_0 = 2

Q4_0 – 4.34 GB @ 8B, +0.4685 ppl

§

MostlyQ4_1 = 3

Q4_1 – 4.78 GB @ 8B, +0.4511 ppl

§

MostlyQ8_0 = 7

Q8_0 – 7.96 GB @ 8B, +0.0026 ppl

§

MostlyQ5_0 = 8

Q5_0 – 5.21 GB @ 8B, +0.1316 ppl

§

MostlyQ5_1 = 9

Q5_1 – 5.65 GB @ 8B, +0.1062 ppl

§

MostlyQ2K = 10

Q2_K – 2.96 GB @ 8B, +3.5199 ppl

§

MostlyQ3KS = 11

Q3_K small – 3.41 GB @ 8B, +1.6321 ppl

§

MostlyQ3KM = 12

Q3_K medium – 3.74 GB @ 8B, +0.6569 ppl

§

MostlyQ3KL = 13

Q3_K large – 4.03 GB @ 8B, +0.5562 ppl

§

MostlyQ4KS = 14

Q4_K small – 4.37 GB @ 8B, +0.2689 ppl

§

MostlyQ4KM = 15

Q4_K medium – 4.58 GB @ 8B, +0.1754 ppl (recommended default)

§

MostlyQ5KS = 16

Q5_K small – 5.21 GB @ 8B, +0.1049 ppl

§

MostlyQ5KM = 17

Q5_K medium – 5.33 GB @ 8B, +0.0569 ppl

§

MostlyQ6K = 18

Q6_K – 6.14 GB @ 8B, +0.0217 ppl

§

MostlyIQ2XXS = 19

IQ2_XXS – 2.06 bpw

§

MostlyIQ2XS = 20

IQ2_XS – 2.31 bpw

§

MostlyQ2KS = 21

Q2_K small

§

MostlyIQ3XS = 22

IQ3_XS – 3.3 bpw

§

MostlyIQ3XXS = 23

IQ3_XXS – 3.06 bpw

§

MostlyIQ1S = 24

IQ1_S – 1.56 bpw (extremely small, high loss)

§

MostlyIQ4NL = 25

IQ4_NL – 4.50 bpw non-linear

§

MostlyIQ3S = 26

IQ3_S – 3.44 bpw

§

MostlyIQ3M = 27

IQ3_M – 3.66 bpw

§

MostlyIQ2S = 28

IQ2_S – 2.5 bpw

§

MostlyIQ2M = 29

IQ2_M – 2.7 bpw

§

MostlyIQ4XS = 30

IQ4_XS – 4.25 bpw non-linear

§

MostlyIQ1M = 31

IQ1_M – 1.75 bpw

§

MostlyBF16 = 32

BF16 – 14 GB @ 7B, −0.0050 ppl vs Mistral-7B

§

MostlyTQ1_0 = 36

TQ1_0 – 1.69 bpw ternary

§

MostlyTQ2_0 = 37

TQ2_0 – 2.06 bpw ternary

§

MostlyMXFP4Moe = 38

MXFP4 (MoE layers)

§

MostlyNVFP4 = 39

NVFP4

Implementations§

Source§

impl LlamaFtype

Source

pub fn name(self) -> &'static str

Short name suitable for filenames (e.g. "Q4_K_M").

Source

pub fn description(self) -> &'static str

Human-readable description with approximate size and PPL delta.

Source

pub fn from_name(name: &str) -> Option<Self>

Look up a variant by its short name (case-insensitive).

use llama_cpp_4::quantize::LlamaFtype;
assert_eq!(LlamaFtype::from_name("Q4_K_M"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("q4_k_m"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("bogus"), None);
Source

pub fn all() -> &'static [Self]

All available types, ordered roughly from largest to smallest.

Trait Implementations§

Source§

impl Clone for LlamaFtype

Source§

fn clone(&self) -> LlamaFtype

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Copy for LlamaFtype

Source§

impl Debug for LlamaFtype

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for LlamaFtype

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Eq for LlamaFtype

Source§

impl From<LlamaFtype> for llama_ftype

Source§

fn from(t: LlamaFtype) -> Self

Converts to this type from the input type.
Source§

impl Hash for LlamaFtype

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for LlamaFtype

Source§

fn eq(&self, other: &LlamaFtype) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 (const: unstable) · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl StructuralPartialEq for LlamaFtype

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more