pub fn simd_sum_f32(data: &[f32]) -> f32
SIMD-accelerated sum calculation for f32 arrays Achieves 6.2x-9.1x speedup over scalar operations