Skip to main content

Module simd

Module simd 

Source
Expand description

Shared SIMD-accelerated functions for posting list compression

This module provides platform-optimized implementations for common operations:

  • Unpacking: Convert packed 8/16/32-bit values to u32 arrays
  • Delta decoding: Prefix sum for converting deltas to absolute values
  • Add one: Increment all values in an array (for TF decoding)

Supports:

  • NEON on aarch64 (Apple Silicon, ARM servers)
  • SSE/SSE4.1 on x86_64 (Intel/AMD)
  • Scalar fallback for other architectures

Enums§

RoundedBitWidth
Rounded bit width type for SIMD-friendly encoding

Functions§

add_one
Add 1 to all values with SIMD acceleration
batch_cosine_scores
Batch cosine similarity: query vs N contiguous vectors.
batch_cosine_scores_f16
Batch cosine similarity: f32 query vs N contiguous f16 vectors.
batch_cosine_scores_u8
Batch cosine similarity: f32 query vs N contiguous u8 vectors.
batch_f32_to_f16
Batch convert f32 slice to f16 (stored as u16)
batch_f32_to_u8
Batch convert f32 slice to u8 with [-1,1] → [0,255] mapping
batch_squared_euclidean_distances
Batch compute squared Euclidean distances from one query to multiple vectors
bits_needed
Compute the number of bits needed to represent a value
cosine_similarity
Compute cosine similarity between two f32 vectors with SIMD acceleration
delta_decode
Delta decode with SIMD acceleration
dequantize_uint8
Dequantize UInt8 weights to f32 with SIMD acceleration
dot_product_f32
Compute dot product of two f32 arrays with SIMD acceleration
f16_to_f32
Convert f16 (stored as u16) to f32
f32_to_f16
Convert f32 to f16 (IEEE 754 half-precision), stored as u16
f32_to_u8_saturating
Quantize f32 in [-1, 1] to u8 [0, 255]
max_f32
Find maximum value in f32 array with SIMD acceleration
pack_rounded
Pack values using rounded bit width (SIMD-friendly)
round_bit_width
Round a bit width to the nearest SIMD-friendly width (0, 8, 16, or 32)
squared_euclidean_distance
Compute squared Euclidean distance between two f32 vectors with SIMD acceleration
u8_to_f32
Dequantize u8 [0, 255] to f32 in [-1, 1]
unpack_8bit
Unpack 8-bit packed values to u32 with SIMD acceleration
unpack_8bit_delta_decode
Fused unpack 8-bit + delta decode in a single pass
unpack_16bit
Unpack 16-bit packed values to u32 with SIMD acceleration
unpack_16bit_delta_decode
Fused unpack 16-bit + delta decode in a single pass
unpack_32bit
Unpack 32-bit packed values to u32 with SIMD acceleration
unpack_delta_decode
Fused unpack + delta decode for arbitrary bit widths
unpack_rounded
Unpack values using rounded bit width with SIMD acceleration
unpack_rounded_delta_decode
Fused unpack + delta decode using rounded bit width