pub fn simd_dot_aligned_f32(a: &[f32], b: &[f32]) -> Result<f32, &'static str>
High-performance SIMD dot product for aligned f32 vectors