Skip to main content

LlamaFtype

Enum LlamaFtype 

Source
#[non_exhaustive]
pub enum LlamaFtype {
Show 34 variants AllF32 = 0, MostlyF16 = 1, MostlyQ4_0 = 2, MostlyQ4_1 = 3, MostlyQ8_0 = 7, MostlyQ5_0 = 8, MostlyQ5_1 = 9, MostlyQ2K = 10, MostlyQ3KS = 11, MostlyQ3KM = 12, MostlyQ3KL = 13, MostlyQ4KS = 14, MostlyQ4KM = 15, MostlyQ5KS = 16, MostlyQ5KM = 17, MostlyQ6K = 18, MostlyIQ2XXS = 19, MostlyIQ2XS = 20, MostlyQ2KS = 21, MostlyIQ3XS = 22, MostlyIQ3XXS = 23, MostlyIQ1S = 24, MostlyIQ4NL = 25, MostlyIQ3S = 26, MostlyIQ3M = 27, MostlyIQ2S = 28, MostlyIQ2M = 29, MostlyIQ4XS = 30, MostlyIQ1M = 31, MostlyBF16 = 32, MostlyTQ1_0 = 36, MostlyTQ2_0 = 37, MostlyMXFP4Moe = 38, MostlyNVFP4 = 39,
}
Expand description

The quantization type used for the bulk of a model file (maps to llama_ftype).

Pass one of these variants to QuantizeParams::new to choose the target precision.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

AllF32 = 0

All tensors stored as full F32 (very large, for reference only)

§

MostlyF16 = 1

F16 – 14 GB @ 7B, +0.0020 ppl vs Mistral-7B

§

MostlyQ4_0 = 2

Q4_0 – 4.34 GB @ 8B, +0.4685 ppl

§

MostlyQ4_1 = 3

Q4_1 – 4.78 GB @ 8B, +0.4511 ppl

§

MostlyQ8_0 = 7

Q8_0 – 7.96 GB @ 8B, +0.0026 ppl

§

MostlyQ5_0 = 8

Q5_0 – 5.21 GB @ 8B, +0.1316 ppl

§

MostlyQ5_1 = 9

Q5_1 – 5.65 GB @ 8B, +0.1062 ppl

§

MostlyQ2K = 10

Q2_K – 2.96 GB @ 8B, +3.5199 ppl

§

MostlyQ3KS = 11

Q3_K small – 3.41 GB @ 8B, +1.6321 ppl

§

MostlyQ3KM = 12

Q3_K medium – 3.74 GB @ 8B, +0.6569 ppl

§

MostlyQ3KL = 13

Q3_K large – 4.03 GB @ 8B, +0.5562 ppl

§

MostlyQ4KS = 14

Q4_K small – 4.37 GB @ 8B, +0.2689 ppl

§

MostlyQ4KM = 15

Q4_K medium – 4.58 GB @ 8B, +0.1754 ppl (recommended default)

§

MostlyQ5KS = 16

Q5_K small – 5.21 GB @ 8B, +0.1049 ppl

§

MostlyQ5KM = 17

Q5_K medium – 5.33 GB @ 8B, +0.0569 ppl

§

MostlyQ6K = 18

Q6_K – 6.14 GB @ 8B, +0.0217 ppl

§

MostlyIQ2XXS = 19

IQ2_XXS – 2.06 bpw

§

MostlyIQ2XS = 20

IQ2_XS – 2.31 bpw

§

MostlyQ2KS = 21

Q2_K small

§

MostlyIQ3XS = 22

IQ3_XS – 3.3 bpw

§

MostlyIQ3XXS = 23

IQ3_XXS – 3.06 bpw

§

MostlyIQ1S = 24

IQ1_S – 1.56 bpw (extremely small, high loss)

§

MostlyIQ4NL = 25

IQ4_NL – 4.50 bpw non-linear

§

MostlyIQ3S = 26

IQ3_S – 3.44 bpw

§

MostlyIQ3M = 27

IQ3_M – 3.66 bpw

§

MostlyIQ2S = 28

IQ2_S – 2.5 bpw

§

MostlyIQ2M = 29

IQ2_M – 2.7 bpw

§

MostlyIQ4XS = 30

IQ4_XS – 4.25 bpw non-linear

§

MostlyIQ1M = 31

IQ1_M – 1.75 bpw

§

MostlyBF16 = 32

BF16 – 14 GB @ 7B, −0.0050 ppl vs Mistral-7B

§

MostlyTQ1_0 = 36

TQ1_0 – 1.69 bpw ternary

§

MostlyTQ2_0 = 37

TQ2_0 – 2.06 bpw ternary

§

MostlyMXFP4Moe = 38

MXFP4 (MoE layers)

§

MostlyNVFP4 = 39

NVFP4

Implementations§

Source§

impl LlamaFtype

Source

pub fn name(self) -> &'static str

Short name suitable for filenames (e.g. "Q4_K_M").

Source

pub fn description(self) -> &'static str

Human-readable description with approximate size and PPL delta.

Source

pub fn from_name(name: &str) -> Option<Self>

Look up a variant by its short name (case-insensitive).

use llama_cpp_4::quantize::LlamaFtype;
assert_eq!(LlamaFtype::from_name("Q4_K_M"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("q4_k_m"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("bogus"), None);
Source

pub fn all() -> &'static [Self]

All available types, ordered roughly from largest to smallest.

Trait Implementations§

Source§

impl Clone for LlamaFtype

Source§

fn clone(&self) -> LlamaFtype

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for LlamaFtype

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Display for LlamaFtype

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl From<LlamaFtype> for llama_ftype

Source§

fn from(t: LlamaFtype) -> Self

Converts to this type from the input type.
Source§

impl Hash for LlamaFtype

Source§

fn hash<__H: Hasher>(&self, state: &mut __H)

Feeds this value into the given Hasher. Read more
1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more
Source§

impl PartialEq for LlamaFtype

Source§

fn eq(&self, other: &LlamaFtype) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Copy for LlamaFtype

Source§

impl Eq for LlamaFtype

Source§

impl StructuralPartialEq for LlamaFtype

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more