pub enum Quantization {
Q4_K_S,
Q4_K_M,
Q5_K_S,
Q5_K_M,
Q6_K,
Q8_0,
F16,
F32,
}Expand description
Quantization levels for GGUF models.
Quantization reduces model size and memory usage at the cost of some quality. Lower quantization (Q4) = smaller, faster, less accurate. Higher quantization (F16) = larger, slower, more accurate.
Variants§
Q4_K_S
4-bit quantization, small variant (smallest, fastest).
Q4_K_M
4-bit quantization, medium variant (recommended for most use cases).
Q5_K_S
5-bit quantization, small variant.
Q5_K_M
5-bit quantization, medium variant (balanced quality/size).
Q6_K
6-bit quantization.
Q8_0
8-bit quantization (high quality).
F16
16-bit floating point (full precision).
F32
32-bit floating point (maximum precision, rarely used).
Implementations§
Source§impl Quantization
impl Quantization
Sourcepub const fn short_name(&self) -> &'static str
pub const fn short_name(&self) -> &'static str
Short name without description.
Sourcepub const fn memory_multiplier(&self) -> f32
pub const fn memory_multiplier(&self) -> f32
Approximate memory multiplier (bytes per parameter).
Use this to estimate model memory requirements:
memory_gb = param_billions * multiplier
Sourcepub const fn all() -> &'static [Quantization]
pub const fn all() -> &'static [Quantization]
Returns all quantization levels in order from smallest to largest.
Trait Implementations§
Source§impl Clone for Quantization
impl Clone for Quantization
Source§fn clone(&self) -> Quantization
fn clone(&self) -> Quantization
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more