Skip to main content

dot_simd

Function dot_simd 

Source
pub fn dot_simd(a: &[f32], b: &[f32]) -> f32
Expand description

SIMD-accelerated dot product.

At runtime dispatches to the AVX2+FMA fast path on capable x86_64 CPUs (practically every deployment target since ~2014) and falls back to the portable wide::f32x8 implementation otherwise.

The portable path uses four independent accumulators so modern CPUs with multiple multiply/FMA ports can retire one pair of multiplies per port per cycle; a single accumulator would serialise every add on the dependency chain.