Expand description
Custom SIMD intrinsics for performance-critical operations
This module provides hand-optimized SIMD implementations:
- AVX2/AVX-512 for x86_64 processors
- NEON for ARM64/Apple Silicon processors (M1/M2/M3/M4)
Distance calculations and other vectorized operations are automatically dispatched to the optimal implementation based on the target architecture.
§Features
- AVX-512 Support: 512-bit operations processing 16 floats per iteration
- INT8 Quantized Operations: SIMD-accelerated quantized vector operations
- Batch Operations: Cache-optimized batch distance calculations
- NEON Optimizations: Prefetch hints and loop unrolling for ARM64
§Performance Optimizations (v2)
- Loop Unrolling: 4x unrolled loops for better instruction-level parallelism
- Prefetch Hints: Software prefetching for large vectors (>256 elements)
- FMA Instructions: Fused multiply-add for improved throughput and accuracy
- Efficient Horizontal Sum: Optimized reduction operations
Functions§
- batch_
cosine_ similarity - Batch cosine similarity - compute similarities from one query to multiple vectors
- batch_
dot_ product - Batch dot product - compute dot products of one query vector against multiple vectors Returns results in the provided output slice Optimized for cache locality by processing vectors in tiles
- batch_
dot_ product_ owned - Batch dot product with owned vectors (for convenience)
- batch_
euclidean - Batch euclidean distance - compute distances from one query to multiple vectors Returns results in the provided output slice Optimized for cache locality
- batch_
euclidean_ owned - Batch euclidean distance with owned vectors (for convenience)
- cosine_
similarity_ avx2 - Legacy alias for backward compatibility
- cosine_
similarity_ simd - SIMD-optimized cosine similarity Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon
- dot_
product_ avx2 - Legacy alias for backward compatibility
- dot_
product_ i8 - SIMD-accelerated dot product for INT8 quantized vectors Uses NEON vdotq_s32 on ARM64, AVX2 _mm256_maddubs_epi16 on x86_64
- dot_
product_ simd - SIMD-optimized dot product Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon
- euclidean_
distance_ avx2 - Legacy alias for backward compatibility
- euclidean_
distance_ simd - SIMD-optimized euclidean distance Uses AVX-512 > AVX2 on x86_64, NEON on ARM64/Apple Silicon, falls back to scalar otherwise
- euclidean_
distance_ squared_ i8 - SIMD-accelerated euclidean distance squared for INT8 quantized vectors Returns squared distance (caller should sqrt if needed)
- manhattan_
distance_ simd - SIMD-optimized Manhattan distance Uses AVX-512 on x86_64, NEON on ARM64/Apple Silicon, scalar on other platforms