pub struct PackedBlock {
pub bits: u8,
pub scale: f16,
pub packed_indices: Vec<u8>,
}Expand description
A packed quantized block that stores a scale factor and bit-packed indices.
Replaces the former BlockTQ2, BlockTQ3, and BlockTQ4 structs with a
single type that tracks its own bit width.
Fields§
§bits: u8Bit width used for packing (2, 3, or 4).
scale: f16Scaling factor (L2-norm of original vector).
packed_indices: Vec<u8>Packed indices (layout depends on bits).
Implementations§
Source§impl PackedBlock
impl PackedBlock
Sourcepub fn new(bits: u8, scale: f16, indices: &[u8]) -> Self
pub fn new(bits: u8, scale: f16, indices: &[u8]) -> Self
Create a new packed block from a scale and a slice of unpacked index values.
The indices are bit-packed internally based on the specified bits width.
Pure Integration: delegates packing to the bit-width-specific helper
selected by the pack closure (IOSP lenient-mode closure pattern).
Sourcepub fn size_bytes(&self) -> usize
pub fn size_bytes(&self) -> usize
Total size of the block in bytes (2 bytes for f16 scale + packed data).
Sourcepub fn from_raw(bits: u8, scale: f16, packed_indices: Vec<u8>) -> Self
pub fn from_raw(bits: u8, scale: f16, packed_indices: Vec<u8>) -> Self
Creates a PackedBlock from pre-packed data without re-packing.
Use this to reconstruct blocks from GPU-quantized data that is already in the correct packed layout.
Pure Operation: field assignment only.
Sourcepub fn unpack_into(&self, count: usize, buf: &mut Vec<u8>)
pub fn unpack_into(&self, count: usize, buf: &mut Vec<u8>)
Unpacks stored indices into a caller-provided buffer, avoiding allocation.
This is the hot-path variant: reuses the buffer across repeated calls (e.g. inside attention score loops) to eliminate per-key allocations.
Pure Integration: delegates unpacking to the bit-width-specific helper
selected by the do_unpack closure (IOSP lenient-mode closure pattern).