#[repr(C)]pub struct BlockQ8_0 {
pub d: f16,
pub qs: [i8; 32],
}Expand description
Q8_0 block: 32 weights quantized to 8-bit signed integers with a shared FP16 scale.
Layout (34 bytes):
d: FP16 block scale.qs: 32 bytes ofi8quantized weights.
Dequant: w[j] = d × qs[j] as f32.
Fields§
§d: f16Block scale (FP16).
qs: [i8; 32]32 × int8 quantized weights.
Implementations§
Source§impl BlockQ8_0
impl BlockQ8_0
Sourcepub fn dequant(blocks: &[Self], output: &mut [f32]) -> BonsaiResult<()>
pub fn dequant(blocks: &[Self], output: &mut [f32]) -> BonsaiResult<()>
Dequantize a slice of Q8_0 blocks into f32 output.
output must have length >= blocks.len() * QK_Q8_0.
Sourcepub fn quantize(input: &[f32]) -> BonsaiResult<Vec<Self>>
pub fn quantize(input: &[f32]) -> BonsaiResult<Vec<Self>>
Quantize f32 input into Q8_0 blocks.
Input length must be a multiple of QK_Q8_0 (32).
Scale = max(|x|) / 127; qs[j] = clamp(round(x / scale), -127, 127).
Sourcepub fn slice_from_bytes(data: &[u8]) -> BonsaiResult<&[Self]>
pub fn slice_from_bytes(data: &[u8]) -> BonsaiResult<&[Self]>
Zero-copy cast of a byte slice to a slice of BlockQ8_0.
§Errors
Returns BonsaiError::KQuantError if data.len() is not a multiple
of BLOCK_Q8_0_BYTES or the pointer is not 2-byte aligned (required
for the embedded f16 field).
Sourcepub fn dequant_to_buf(&self, buf: &mut [f32; 32])
pub fn dequant_to_buf(&self, buf: &mut [f32; 32])
Dequantize this single block into a 32-element f32 buffer.
Used by the GEMV kernel to avoid heap allocation on the hot path.