#[non_exhaustive]pub enum LlamaFtype {
Show 34 variants
AllF32 = 0,
MostlyF16 = 1,
MostlyQ4_0 = 2,
MostlyQ4_1 = 3,
MostlyQ8_0 = 7,
MostlyQ5_0 = 8,
MostlyQ5_1 = 9,
MostlyQ2K = 10,
MostlyQ3KS = 11,
MostlyQ3KM = 12,
MostlyQ3KL = 13,
MostlyQ4KS = 14,
MostlyQ4KM = 15,
MostlyQ5KS = 16,
MostlyQ5KM = 17,
MostlyQ6K = 18,
MostlyIQ2XXS = 19,
MostlyIQ2XS = 20,
MostlyQ2KS = 21,
MostlyIQ3XS = 22,
MostlyIQ3XXS = 23,
MostlyIQ1S = 24,
MostlyIQ4NL = 25,
MostlyIQ3S = 26,
MostlyIQ3M = 27,
MostlyIQ2S = 28,
MostlyIQ2M = 29,
MostlyIQ4XS = 30,
MostlyIQ1M = 31,
MostlyBF16 = 32,
MostlyTQ1_0 = 36,
MostlyTQ2_0 = 37,
MostlyMXFP4Moe = 38,
MostlyNVFP4 = 39,
}Expand description
The quantization type used for the bulk of a model file (maps to llama_ftype).
Pass one of these variants to QuantizeParams::new to choose the target precision.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
AllF32 = 0
All tensors stored as full F32 (very large, for reference only)
MostlyF16 = 1
F16 – 14 GB @ 7B, +0.0020 ppl vs Mistral-7B
MostlyQ4_0 = 2
Q4_0 – 4.34 GB @ 8B, +0.4685 ppl
MostlyQ4_1 = 3
Q4_1 – 4.78 GB @ 8B, +0.4511 ppl
MostlyQ8_0 = 7
Q8_0 – 7.96 GB @ 8B, +0.0026 ppl
MostlyQ5_0 = 8
Q5_0 – 5.21 GB @ 8B, +0.1316 ppl
MostlyQ5_1 = 9
Q5_1 – 5.65 GB @ 8B, +0.1062 ppl
MostlyQ2K = 10
Q2_K – 2.96 GB @ 8B, +3.5199 ppl
MostlyQ3KS = 11
Q3_K small – 3.41 GB @ 8B, +1.6321 ppl
MostlyQ3KM = 12
Q3_K medium – 3.74 GB @ 8B, +0.6569 ppl
MostlyQ3KL = 13
Q3_K large – 4.03 GB @ 8B, +0.5562 ppl
MostlyQ4KS = 14
Q4_K small – 4.37 GB @ 8B, +0.2689 ppl
MostlyQ4KM = 15
Q4_K medium – 4.58 GB @ 8B, +0.1754 ppl (recommended default)
MostlyQ5KS = 16
Q5_K small – 5.21 GB @ 8B, +0.1049 ppl
MostlyQ5KM = 17
Q5_K medium – 5.33 GB @ 8B, +0.0569 ppl
MostlyQ6K = 18
Q6_K – 6.14 GB @ 8B, +0.0217 ppl
MostlyIQ2XXS = 19
IQ2_XXS – 2.06 bpw
MostlyIQ2XS = 20
IQ2_XS – 2.31 bpw
MostlyQ2KS = 21
Q2_K small
MostlyIQ3XS = 22
IQ3_XS – 3.3 bpw
MostlyIQ3XXS = 23
IQ3_XXS – 3.06 bpw
MostlyIQ1S = 24
IQ1_S – 1.56 bpw (extremely small, high loss)
MostlyIQ4NL = 25
IQ4_NL – 4.50 bpw non-linear
MostlyIQ3S = 26
IQ3_S – 3.44 bpw
MostlyIQ3M = 27
IQ3_M – 3.66 bpw
MostlyIQ2S = 28
IQ2_S – 2.5 bpw
MostlyIQ2M = 29
IQ2_M – 2.7 bpw
MostlyIQ4XS = 30
IQ4_XS – 4.25 bpw non-linear
MostlyIQ1M = 31
IQ1_M – 1.75 bpw
MostlyBF16 = 32
BF16 – 14 GB @ 7B, −0.0050 ppl vs Mistral-7B
MostlyTQ1_0 = 36
TQ1_0 – 1.69 bpw ternary
MostlyTQ2_0 = 37
TQ2_0 – 2.06 bpw ternary
MostlyMXFP4Moe = 38
MXFP4 (MoE layers)
MostlyNVFP4 = 39
NVFP4
Implementations§
Source§impl LlamaFtype
impl LlamaFtype
Sourcepub fn description(self) -> &'static str
pub fn description(self) -> &'static str
Human-readable description with approximate size and PPL delta.
Sourcepub fn from_name(name: &str) -> Option<Self>
pub fn from_name(name: &str) -> Option<Self>
Look up a variant by its short name (case-insensitive).
use llama_cpp_4::quantize::LlamaFtype;
assert_eq!(LlamaFtype::from_name("Q4_K_M"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("q4_k_m"), Some(LlamaFtype::MostlyQ4KM));
assert_eq!(LlamaFtype::from_name("bogus"), None);Trait Implementations§
Source§impl Clone for LlamaFtype
impl Clone for LlamaFtype
Source§fn clone(&self) -> LlamaFtype
fn clone(&self) -> LlamaFtype
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more