Expand description
Shared SIMD-accelerated functions for posting list compression
This module provides platform-optimized implementations for common operations:
- Unpacking: Convert packed 8/16/32-bit values to u32 arrays
- Delta decoding: Prefix sum for converting deltas to absolute values
- Add one: Increment all values in an array (for TF decoding)
Supports:
- NEON on aarch64 (Apple Silicon, ARM servers)
- SSE/SSE4.1 on x86_64 (Intel/AMD)
- Scalar fallback for other architectures
Enums§
- Rounded
BitWidth - Rounded bit width type for SIMD-friendly encoding
Functions§
- add_one
- Add 1 to all values with SIMD acceleration
- batch_
cosine_ scores - Batch cosine similarity: query vs N contiguous vectors.
- batch_
cosine_ scores_ f16 - Batch cosine similarity: f32 query vs N contiguous f16 vectors.
- batch_
cosine_ scores_ u8 - Batch cosine similarity: f32 query vs N contiguous u8 vectors.
- batch_
f32_ to_ f16 - Batch convert f32 slice to f16 (stored as u16)
- batch_
f32_ to_ u8 - Batch convert f32 slice to u8 with [-1,1] → [0,255] mapping
- batch_
squared_ euclidean_ distances - Batch compute squared Euclidean distances from one query to multiple vectors
- bits_
needed - Compute the number of bits needed to represent a value
- cosine_
similarity - Compute cosine similarity between two f32 vectors with SIMD acceleration
- delta_
decode - Delta decode with SIMD acceleration
- dequantize_
uint8 - Dequantize UInt8 weights to f32 with SIMD acceleration
- dot_
product_ f32 - Compute dot product of two f32 arrays with SIMD acceleration
- f16_
to_ f32 - Convert f16 (stored as u16) to f32
- f32_
to_ f16 - Convert f32 to f16 (IEEE 754 half-precision), stored as u16
- f32_
to_ u8_ saturating - Quantize f32 in [-1, 1] to u8 [0, 255]
- max_f32
- Find maximum value in f32 array with SIMD acceleration
- pack_
rounded - Pack values using rounded bit width (SIMD-friendly)
- round_
bit_ width - Round a bit width to the nearest SIMD-friendly width (0, 8, 16, or 32)
- squared_
euclidean_ distance - Compute squared Euclidean distance between two f32 vectors with SIMD acceleration
- u8_
to_ f32 - Dequantize u8 [0, 255] to f32 in [-1, 1]
- unpack_
8bit - Unpack 8-bit packed values to u32 with SIMD acceleration
- unpack_
8bit_ delta_ decode - Fused unpack 8-bit + delta decode in a single pass
- unpack_
16bit - Unpack 16-bit packed values to u32 with SIMD acceleration
- unpack_
16bit_ delta_ decode - Fused unpack 16-bit + delta decode in a single pass
- unpack_
32bit - Unpack 32-bit packed values to u32 with SIMD acceleration
- unpack_
delta_ decode - Fused unpack + delta decode for arbitrary bit widths
- unpack_
rounded - Unpack values using rounded bit width with SIMD acceleration
- unpack_
rounded_ delta_ decode - Fused unpack + delta decode using rounded bit width