Skip to main content

Module simd

Module simd 

Source
Expand description

SIMD-Accelerated Distance Functions

This module provides SIMD-optimized implementations of vector distance calculations using CPU intrinsics (AVX2 on x86_64). Functions automatically dispatch to SIMD or scalar implementations based on runtime CPU feature detection.

§Architecture

  • Scalar fallback: Pure Rust implementation, always available
  • AVX2 path: x86_64 intrinsics with 256-bit registers (8 floats per iteration)
  • Runtime dispatch: One-time CPU feature detection with cached result

§Safety Guarantees

All unsafe blocks are contained within this module and only use:

  • Unaligned loads (_mm256_loadu_ps) - no alignment requirements
  • Standard SIMD intrinsics - well-defined behavior for any f32 input
  • Proper remainder handling - scalar loop processes trailing elements

§Performance Characteristics

§AVX2 (256-bit)

  • Throughput: 8 floats per iteration
  • Speedup: ~4-6x for large vectors vs scalar (depends on FMA availability)
  • Latency: Similar to scalar for small vectors (< 8 elements)

§Scalar Fallback

  • Throughput: 1 float per iteration
  • Availability: All platforms, all CPUs
  • Performance: Baseline, optimized Rust code

§Correctness

SIMD and scalar implementations produce bit-identical results for the same inputs. All operations follow IEEE 754 floating-point semantics.

§Examples

use sqlitegraph::hnsw::simd::dot_product;

let a = vec![1.0, 2.0, 3.0];
let b = vec![4.0, 5.0, 6.0];
let product = dot_product(&a, &b);
assert_eq!(product, 32.0);

Functions§

compute_norm_squared
Runtime-dispatched squared norm computation with AVX2 acceleration
compute_norm_squared_scalar
Scalar fallback implementation of squared norm computation
cosine_similarity
Runtime-dispatched cosine similarity with AVX2 acceleration
cosine_similarity_scalar
Scalar fallback implementation of cosine similarity
dot_product
Runtime-dispatched dot product with AVX2 acceleration
dot_product_scalar
Scalar fallback implementation of dot product
euclidean_distance
Runtime-dispatched Euclidean (L2) distance computation
euclidean_distance_scalar
Scalar fallback implementation of Euclidean (L2) distance