pub enum Precision {
INT4,
INT8,
BF16,
FP16,
FP32,
}Expand description
Precision levels for inference
Variants§
INT4
4-bit integer (most aggressive compression)
INT8
8-bit integer
BF16
16-bit floating point (brain float)
FP16
16-bit floating point
FP32
32-bit floating point (full precision)
Implementations§
Source§impl Precision
impl Precision
Sourcepub fn vram_ratio(&self) -> f32
pub fn vram_ratio(&self) -> f32
VRAM usage relative to FP32 (0.0 - 1.0)
Sourcepub fn speedup_factor(&self) -> f32
pub fn speedup_factor(&self) -> f32
Approximate speedup factor relative to FP32
Sourcepub fn quality_factor(&self) -> f32
pub fn quality_factor(&self) -> f32
Quality impact (lower is more lossy) This is an approximation - actual impact depends on model and content
Sourcepub fn is_lossless(&self) -> bool
pub fn is_lossless(&self) -> bool
Whether this precision is lossless (or nearly so)
Trait Implementations§
Source§impl<'de> Deserialize<'de> for Precision
impl<'de> Deserialize<'de> for Precision
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Deserialize this value from the given Serde deserializer. Read more
Source§impl Ord for Precision
impl Ord for Precision
Source§impl PartialOrd for Precision
impl PartialOrd for Precision
impl Copy for Precision
impl Eq for Precision
impl StructuralPartialEq for Precision
Auto Trait Implementations§
impl Freeze for Precision
impl RefUnwindSafe for Precision
impl Send for Precision
impl Sync for Precision
impl Unpin for Precision
impl UnwindSafe for Precision
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more