pub enum QuantKind {
Gptq {
bits: u32,
group_size: usize,
desc_act: bool,
},
Awq {
bits: u32,
group_size: usize,
},
Gguf {
quant_type: GgufQuantType,
},
}Expand description
Quantization flavour discriminator for Backend::gemm_quant.
Distinct schemes need distinct kernels. Carried as a parameter so the Backend trait does not explode with one method per quantization type.
Variants§
Gptq
GPTQ: group-wise int4/int8 with scales + zeros (asymmetric) + optional g_idx.
Awq
AWQ: activation-aware int4 with scales + zeros, different packing from GPTQ.
Gguf
GGUF: one of k-quants / legacy quants, fully specified by the inner type.
Fields
§
quant_type: GgufQuantTypeTrait Implementations§
Auto Trait Implementations§
impl Freeze for QuantKind
impl RefUnwindSafe for QuantKind
impl Send for QuantKind
impl Sync for QuantKind
impl Unpin for QuantKind
impl UnsafeUnpin for QuantKind
impl UnwindSafe for QuantKind
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more