Skip to main content

Module simd_intrinsics

Module simd_intrinsics 

Source
Expand description

Custom SIMD intrinsics for performance-critical operations

This module provides hand-optimized SIMD implementations:

  • AVX2/AVX-512 for x86_64 processors
  • NEON for ARM64/Apple Silicon processors (M1/M2/M3/M4)

Distance calculations and other vectorized operations are automatically dispatched to the optimal implementation based on the target architecture.

§Features

  • AVX-512 Support: 512-bit operations processing 16 floats per iteration
  • INT8 Quantized Operations: SIMD-accelerated quantized vector operations
  • Batch Operations: Cache-optimized batch distance calculations
  • NEON Optimizations: Prefetch hints and loop unrolling for ARM64

§Performance Optimizations (v2)

  • Loop Unrolling: 4x unrolled loops for better instruction-level parallelism
  • Prefetch Hints: Software prefetching for large vectors (>256 elements)
  • FMA Instructions: Fused multiply-add for improved throughput and accuracy
  • Efficient Horizontal Sum: Optimized reduction operations

Functions§

batch_cosine_similarity
Batch cosine similarity - compute similarities from one query to multiple vectors
batch_dot_product
Batch dot product - compute dot products of one query vector against multiple vectors Returns results in the provided output slice Optimized for cache locality by processing vectors in tiles
batch_dot_product_owned
Batch dot product with owned vectors (for convenience)
batch_euclidean
Batch euclidean distance - compute distances from one query to multiple vectors Returns results in the provided output slice Optimized for cache locality
batch_euclidean_owned
Batch euclidean distance with owned vectors (for convenience)
cosine_similarity_avx2
Legacy alias for backward compatibility
cosine_similarity_simd
SIMD-optimized cosine similarity Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon
dot_product_avx2
Legacy alias for backward compatibility
dot_product_i8
SIMD-accelerated dot product for INT8 quantized vectors Uses NEON vdotq_s32 on ARM64, AVX2 _mm256_maddubs_epi16 on x86_64
dot_product_simd
SIMD-optimized dot product Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon
euclidean_distance_avx2
Legacy alias for backward compatibility
euclidean_distance_simd
SIMD-optimized euclidean distance Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon, falls back to scalar otherwise
euclidean_distance_squared_i8
SIMD-accelerated euclidean distance squared for INT8 quantized vectors Returns squared distance (caller should sqrt if needed)
manhattan_distance_simd
SIMD-optimized Manhattan distance Uses AVX-512 on x86_64, NEON on ARM64/Apple Silicon, scalar on other platforms