pub fn simd_axpy(c: &mut [f64], b: &[f64], scalar: f64, len: usize)Expand description
SIMD-accelerated AXPY: c[0..len] += scalar * b[0..len].
Used in the inner loop of tiled matrix multiplication where scalar = A[i,p]
and b is a row segment of B. Processes 4 elements per iteration with AVX2.
Deterministic because each c[j] accumulates the same scalar * b[j]
contribution using separate mul + add (no FMA), matching scalar behavior.