Enum LlamaFtype

Source

#[non_exhaustive]pub enum LlamaFtype {
Show 34 variants    AllF32 = 0,
    MostlyF16 = 1,
    MostlyQ4_0 = 2,
    MostlyQ4_1 = 3,
    MostlyQ8_0 = 7,
    MostlyQ5_0 = 8,
    MostlyQ5_1 = 9,
    MostlyQ2K = 10,
    MostlyQ3KS = 11,
    MostlyQ3KM = 12,
    MostlyQ3KL = 13,
    MostlyQ4KS = 14,
    MostlyQ4KM = 15,
    MostlyQ5KS = 16,
    MostlyQ5KM = 17,
    MostlyQ6K = 18,
    MostlyIQ2XXS = 19,
    MostlyIQ2XS = 20,
    MostlyQ2KS = 21,
    MostlyIQ3XS = 22,
    MostlyIQ3XXS = 23,
    MostlyIQ1S = 24,
    MostlyIQ4NL = 25,
    MostlyIQ3S = 26,
    MostlyIQ3M = 27,
    MostlyIQ2S = 28,
    MostlyIQ2M = 29,
    MostlyIQ4XS = 30,
    MostlyIQ1M = 31,
    MostlyBF16 = 32,
    MostlyTQ1_0 = 36,
    MostlyTQ2_0 = 37,
    MostlyMXFP4Moe = 38,
    MostlyNVFP4 = 39,
}

Expand description

The quantization type used for the bulk of a model file (maps to llama_ftype).

Pass one of these variants to QuantizeParams::new to choose the target precision.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

§

AllF32 = 0

All tensors stored as full F32 (very large, for reference only)

§

MostlyF16 = 1

F16 – 14 GB @ 7B, +0.0020 ppl vs Mistral-7B

§

MostlyQ4_0 = 2

Q4_0 – 4.34 GB @ 8B, +0.4685 ppl

§

MostlyQ4_1 = 3

Q4_1 – 4.78 GB @ 8B, +0.4511 ppl

§

MostlyQ8_0 = 7

Q8_0 – 7.96 GB @ 8B, +0.0026 ppl

§

MostlyQ5_0 = 8

Q5_0 – 5.21 GB @ 8B, +0.1316 ppl

§

MostlyQ5_1 = 9

Q5_1 – 5.65 GB @ 8B, +0.1062 ppl

§

MostlyQ2K = 10

Q2_K – 2.96 GB @ 8B, +3.5199 ppl

§

MostlyQ3KS = 11

Q3_K small – 3.41 GB @ 8B, +1.6321 ppl

§

MostlyQ3KM = 12

Q3_K medium – 3.74 GB @ 8B, +0.6569 ppl

§

MostlyQ3KL = 13

Q3_K large – 4.03 GB @ 8B, +0.5562 ppl

§

MostlyQ4KS = 14

Q4_K small – 4.37 GB @ 8B, +0.2689 ppl

§

MostlyQ4KM = 15

Q4_K medium – 4.58 GB @ 8B, +0.1754 ppl (recommended default)

§

MostlyQ5KS = 16

Q5_K small – 5.21 GB @ 8B, +0.1049 ppl

§

MostlyQ5KM = 17

Q5_K medium – 5.33 GB @ 8B, +0.0569 ppl

§

MostlyQ6K = 18

Q6_K – 6.14 GB @ 8B, +0.0217 ppl

§

MostlyIQ2XXS = 19

IQ2_XXS – 2.06 bpw

§

MostlyIQ2XS = 20

IQ2_XS – 2.31 bpw

§

MostlyQ2KS = 21

Q2_K small

§

MostlyIQ3XS = 22

IQ3_XS – 3.3 bpw

§

MostlyIQ3XXS = 23

IQ3_XXS – 3.06 bpw

§

MostlyIQ1S = 24

IQ1_S – 1.56 bpw (extremely small, high loss)

§

MostlyIQ4NL = 25

IQ4_NL – 4.50 bpw non-linear

§

MostlyIQ3S = 26

IQ3_S – 3.44 bpw

§

MostlyIQ3M = 27

IQ3_M – 3.66 bpw

§

MostlyIQ2S = 28

IQ2_S – 2.5 bpw

§

MostlyIQ2M = 29

IQ2_M – 2.7 bpw

§

MostlyIQ4XS = 30

IQ4_XS – 4.25 bpw non-linear

§

MostlyIQ1M = 31

IQ1_M – 1.75 bpw

§

MostlyBF16 = 32

BF16 – 14 GB @ 7B, −0.0050 ppl vs Mistral-7B

§

MostlyTQ1_0 = 36

TQ1_0 – 1.69 bpw ternary

§

MostlyTQ2_0 = 37

TQ2_0 – 2.06 bpw ternary

§

MostlyMXFP4Moe = 38

MXFP4 (MoE layers)

§

MostlyNVFP4 = 39

NVFP4

Implementations§

Source §

impl LlamaFtype

Source

pub fn name(self) -> &'static str

Short name suitable for filenames (e.g. "Q4_K_M").

Source

pub fn description(self) -> &'static str

Human-readable description with approximate size and PPL delta.

Source

pub fn from_name(name: &str) -> Option<Self>

Look up a variant by its short name (case-insensitive).

use llama_cpp_4::quantize::LlamaFtype;
assert_eq!(LlamaFtype::from_name("Q4_K_M"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("q4_k_m"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("bogus"), None);