pub enum QuantType {
Show 30 variants
F32,
F16,
BF16,
Q8_0,
Q8_K,
Q6_K,
Q5_K_S,
Q5_K_M,
Q5_0,
Q5_1,
Q4_K_S,
Q4_K_M,
Q4_0,
Q4_1,
Q3_K_S,
Q3_K_M,
Q3_K_L,
Q2_K_S,
Q2_K,
IQ4_NL,
IQ4_XS,
IQ3_S,
IQ3_M,
IQ3_XS,
IQ3_XXS,
IQ2_S,
IQ2_XS,
IQ2_XXS,
IQ1_S,
IQ1_M,
}Expand description
Common quantization types (GGUF spec naming convention)
Variants§
F32
Full precision (FP32)
F16
Half precision (FP16)
BF16
Brain floating point (BF16)
Q8_0
8-bit integer
Q8_K
8-bit with K-quants
Q6_K
6-bit K-quants
Q5_K_S
5-bit K-quants (small)
Q5_K_M
5-bit K-quants (medium)
Q5_0
5-bit (legacy)
Q5_1
5-bit with 1 (legacy)
Q4_K_S
4-bit K-quants (small)
Q4_K_M
4-bit K-quants (medium)
Q4_0
4-bit (legacy)
Q4_1
4-bit with 1 (legacy)
Q3_K_S
3-bit K-quants (small)
Q3_K_M
3-bit K-quants (medium)
Q3_K_L
3-bit K-quants (large)
Q2_K_S
2-bit K-quants (small)
Q2_K
2-bit K-quants
IQ4_NL
Importance-weighted 4-bit (non-linear)
IQ4_XS
Importance-weighted 4-bit (extra small)
IQ3_S
Importance-weighted 3-bit (small)
IQ3_M
Importance-weighted 3-bit (medium)
IQ3_XS
Importance-weighted 3-bit (extra small)
IQ3_XXS
Importance-weighted 3-bit (extra extra small)
IQ2_S
Importance-weighted 2-bit (small)
IQ2_XS
Importance-weighted 2-bit (extra small)
IQ2_XXS
Importance-weighted 2-bit (extra extra small)
IQ1_S
Importance-weighted 1-bit (small)
IQ1_M
Importance-weighted 1-bit (medium)
Implementations§
Source§impl QuantType
impl QuantType
Sourcepub const fn bits_per_weight(&self) -> f32
pub const fn bits_per_weight(&self) -> f32
Get bits per weight
Sourcepub fn estimate_size(&self, parameters: u64) -> u64
pub fn estimate_size(&self, parameters: u64) -> u64
Estimate file size for given parameter count
Sourcepub const fn quality_tier(&self) -> u8
pub const fn quality_tier(&self) -> u8
Get quality tier (1-5, higher is better quality)
Sourcepub fn vram_requirement(&self, parameters: u64) -> f64
pub fn vram_requirement(&self, parameters: u64) -> f64
Get recommended VRAM in GB for given parameter count
Trait Implementations§
impl Copy for QuantType
Source§impl<'de> Deserialize<'de> for QuantType
impl<'de> Deserialize<'de> for QuantType
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
impl Eq for QuantType
impl StructuralPartialEq for QuantType
Auto Trait Implementations§
impl Freeze for QuantType
impl RefUnwindSafe for QuantType
impl Send for QuantType
impl Sync for QuantType
impl Unpin for QuantType
impl UnsafeUnpin for QuantType
impl UnwindSafe for QuantType
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> DeserializeOwned for Twhere
T: for<'de> Deserialize<'de>,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.impl<A, B, T> HttpServerConnExec<A, B> for Twhere
B: Body,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<T> ToStringFallible for Twhere
T: Display,
impl<T> ToStringFallible for Twhere
T: Display,
Source§fn try_to_string(&self) -> Result<String, TryReserveError>
fn try_to_string(&self) -> Result<String, TryReserveError>
ToString::to_string, but without panic on OOM.