#[repr(C)]pub struct BlockQ4_0 {
pub d: f16,
pub qs: [u8; 16],
}Expand description
Q4_0 block: 32 weights quantized to 4 bits each with a shared FP16 scale.
Layout (18 bytes):
d: FP16 block scale.qs: 16 bytes — 32 × 4-bit quantized weights, 2 per byte. Even indexj→ low nibbleqs[j/2] & 0x0F; odd → high nibbleqs[j/2] >> 4.
Dequant: w[j] = d × (nibble[j] as f32 − 8.0) — symmetric around zero.
Fields§
§d: f16Block scale (FP16).
qs: [u8; 16]32 × 4-bit quantized weights, 2 per byte (low nibble = even index).
Implementations§
Source§impl BlockQ4_0
impl BlockQ4_0
Sourcepub fn dequant(blocks: &[Self], output: &mut [f32]) -> BonsaiResult<()>
pub fn dequant(blocks: &[Self], output: &mut [f32]) -> BonsaiResult<()>
Dequantize a slice of Q4_0 blocks into f32 output.
output must have length >= blocks.len() * QK_Q4_0.
Sourcepub fn quantize(input: &[f32]) -> BonsaiResult<Vec<Self>>
pub fn quantize(input: &[f32]) -> BonsaiResult<Vec<Self>>
Quantize f32 input into Q4_0 blocks.
Input length must be a multiple of QK_Q4_0 (32).
Scale = max(|input|) / 7.0; nibble = clamp(round(x / scale + 8), 0, 15).
Sourcepub fn slice_from_bytes(data: &[u8]) -> BonsaiResult<&[Self]>
pub fn slice_from_bytes(data: &[u8]) -> BonsaiResult<&[Self]>
Zero-copy cast of a byte slice to a slice of BlockQ4_0.
§Errors
Returns BonsaiError::KQuantError if data.len() is not a multiple
of BLOCK_Q4_0_BYTES or the pointer is not 2-byte aligned (required
for the embedded f16 field).
Sourcepub fn dequant_to_buf(&self, buf: &mut [f32; 32])
pub fn dequant_to_buf(&self, buf: &mut [f32; 32])
Dequantize this single block into a 32-element f32 buffer.
Used by the GEMV kernel to avoid heap allocation on the hot path.