pub fn round_bit_width(bits: u8) -> u8
Round a bit width to the nearest SIMD-friendly width (0, 8, 16, or 32)