Expand description
Standard GGUF quantization block types: Q4_0 (4-bit) and Q8_0 (8-bit).
These are the most common quantization formats in distributed GGUF model files, accounting for roughly 80% of publicly released models.
-
Q4_0 (GGML type 2): 32 weights per block, 18 bytes total. Block scale
d: f16+ 16 bytes of packed 4-bit nibbles (2 per byte). Dequant:w[j] = d × (nibble[j] − 8). -
Q8_0 (GGML type 8): 32 weights per block, 34 bytes total. Block scale
d: f16+ 32 bytes ofi8weights. Dequant:w[j] = d × qs[j].
Structs§
- Block
Q4_ 0 - Q4_0 block: 32 weights quantized to 4 bits each with a shared FP16 scale.
- Block
Q8_ 0 - Q8_0 block: 32 weights quantized to 8-bit signed integers with a shared FP16 scale.
Constants§
- BLOCK_
Q4_ 0_ BYTES - Number of bytes per Q4_0 block (2-byte f16 scale + 16 bytes of 4-bit pairs).
- BLOCK_
Q8_ 0_ BYTES - Number of bytes per Q8_0 block (2-byte f16 scale + 32 bytes of i8 weights).
- QK_Q4_0
- Number of weights per Q4_0 block.
- QK_Q8_0
- Number of weights per Q8_0 block.