pub fn add(a: &[f32], b: &[f32], output: &mut [f32]) -> Result<(), TruenoError>
Element-wise add: output_i = a_i + b_i
Uses AVX2 _mm256_add_ps when available.
_mm256_add_ps
Returns Err if a, b, and output lengths don’t match.
Err