pub fn add_alloc(a: &[f32], b: &[f32]) -> Vec<f32>
Element-wise add with output allocation. Avoids zero-fill overhead.
Panics if a and b have different lengths.
a
b