#[non_exhaustive]#[repr(u16)]pub enum GgufBlockFormat {
Q4_0 = 2,
Q4_1 = 3,
Q5_0 = 6,
Q5_1 = 7,
Q8_0 = 8,
Q2K = 10,
Q3K = 11,
Q4K = 12,
Q5K = 13,
Q6K = 14,
Q8K = 15,
}Expand description
GGUF block-format selector for QuantizeKind::GgufDequantize /
QuantizeKind::GgufMmvq plans. Mirrors the discriminants used by
llama.cpp / ggml so a descriptor can be round-tripped to a GGUF
file header without translation.
Block sizes:
- Type-0/1 variants (
Q4_0,Q4_1,Q5_0,Q5_1,Q8_0) pack 32 quantized values per block plus a shared FP scale (+ min for the_1variants). - k-quants variants (
Q2_K…Q8_K) pack 256 values per super-block with a multi-level scale hierarchy (quantized sub-block scales + FP super-block scale).
Discriminant values match the GGML_TYPE_* enum in upstream
ggml.h, ensuring binary compatibility with GGUF file headers.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Q4_0 = 2
4-bit, 32-element block, single FP scale. block_q4_0.
Q4_1 = 3
4-bit, 32-element block, FP scale + FP min. block_q4_1.
Q5_0 = 6
5-bit, 32-element block, single FP scale. block_q5_0.
Q5_1 = 7
5-bit, 32-element block, FP scale + FP min. block_q5_1.
Q8_0 = 8
8-bit, 32-element block, single FP scale. block_q8_0.
Q2K = 10
2.5-bit (effective), 256-element super-block. block_q2_K.
Q3K = 11
3.4-bit (effective), 256-element super-block. block_q3_K.
Q4K = 12
4.5-bit (effective), 256-element super-block. block_q4_K.
Q5K = 13
5.5-bit (effective), 256-element super-block. block_q5_K.
Q6K = 14
6.6-bit (effective), 256-element super-block. block_q6_K.
Q8K = 15
8-bit, 256-element super-block (CPU-side intermediate).
block_q8_K. Dequant supported; MMVQ NOT supported (matches
llama.cpp — no upstream MMVQ specialization).
Implementations§
Source§impl GgufBlockFormat
impl GgufBlockFormat
Sourcepub const fn block_size(self) -> usize
pub const fn block_size(self) -> usize
Number of FP elements per packed block.
Sourcepub const fn type_size(self) -> usize
pub const fn type_size(self) -> usize
Number of bytes per packed block. Matches sizeof(block_q*)
from ggml.h. Used by the Rust plan layer to size the input
weight buffer.
Sourcepub const fn is_type_01(self) -> bool
pub const fn is_type_01(self) -> bool
true for the type-0/1 family (32-element blocks); false
for the k-quants family (256-element super-blocks).
Sourcepub const fn has_mmvq(self) -> bool
pub const fn has_mmvq(self) -> bool
true if MMVQ (fused dequant + matvec) is supported for this
block format. As of Phase 11.4, all 11 GGUF block formats ship a
MMVQ kernel. Q8_K MMVQ is a bespoke baracuda addition (upstream
llama.cpp / Fuel reserve Q8_K as a CPU-side intermediate and ship
dequant only); we close that gap to avoid 2× memory traffic on
the inference decode step.
Trait Implementations§
Source§impl Clone for GgufBlockFormat
impl Clone for GgufBlockFormat
Source§fn clone(&self) -> GgufBlockFormat
fn clone(&self) -> GgufBlockFormat
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreimpl Copy for GgufBlockFormat
Source§impl Debug for GgufBlockFormat
impl Debug for GgufBlockFormat
impl Eq for GgufBlockFormat
Source§impl Hash for GgufBlockFormat
impl Hash for GgufBlockFormat
Source§impl PartialEq for GgufBlockFormat
impl PartialEq for GgufBlockFormat
Source§fn eq(&self, other: &GgufBlockFormat) -> bool
fn eq(&self, other: &GgufBlockFormat) -> bool
self and other values to be equal, and is used by ==.