pub enum Quantization {
Q4_K_S,
Q4_K_M,
Q5_K_S,
Q5_K_M,
Q6_K,
Q8_0,
F16,
F32,
}Expand description
Quantization levels for GGUF models.
Quantization reduces model size and memory usage at the cost of some quality. Lower quantization (Q4) = smaller, faster, less accurate. Higher quantization (F16) = larger, slower, more accurate.
Variants§
Q4_K_S
4-bit quantization, small variant (smallest, fastest).
Q4_K_M
4-bit quantization, medium variant (recommended for most use cases).
Q5_K_S
5-bit quantization, small variant.
Q5_K_M
5-bit quantization, medium variant (balanced quality/size).
Q6_K
6-bit quantization.
Q8_0
8-bit quantization (high quality).
F16
16-bit floating point (full precision).
F32
32-bit floating point (maximum precision, rarely used).
Implementations§
Source§impl Quantization
impl Quantization
Sourcepub const fn short_name(&self) -> &'static str
pub const fn short_name(&self) -> &'static str
Short name without description.
Sourcepub const fn memory_multiplier(&self) -> f32
pub const fn memory_multiplier(&self) -> f32
Approximate memory multiplier (bytes per parameter).
Use this to estimate model memory requirements:
memory_gb = param_billions * multiplier
Sourcepub const fn all() -> &'static [Quantization]
pub const fn all() -> &'static [Quantization]
Returns all quantization levels in order from smallest to largest.
Trait Implementations§
Source§impl Clone for Quantization
impl Clone for Quantization
Source§fn clone(&self) -> Quantization
fn clone(&self) -> Quantization
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for Quantization
impl Debug for Quantization
Source§impl Default for Quantization
impl Default for Quantization
Source§fn default() -> Quantization
fn default() -> Quantization
Default to Q4_K_M as it provides the best balance of size and quality.
Source§impl Display for Quantization
impl Display for Quantization
Source§impl Hash for Quantization
impl Hash for Quantization
Source§impl PartialEq for Quantization
impl PartialEq for Quantization
impl Copy for Quantization
impl Eq for Quantization
impl StructuralPartialEq for Quantization
Auto Trait Implementations§
impl Freeze for Quantization
impl RefUnwindSafe for Quantization
impl Send for Quantization
impl Sync for Quantization
impl Unpin for Quantization
impl UnsafeUnpin for Quantization
impl UnwindSafe for Quantization
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<T> ToStringFallible for Twhere
T: Display,
impl<T> ToStringFallible for Twhere
T: Display,
Source§fn try_to_string(&self) -> Result<String, TryReserveError>
fn try_to_string(&self) -> Result<String, TryReserveError>
ToString::to_string, but without panic on OOM.