pub fn relu_to_slice_dispatch(input: &[f32], output: &mut [f32])Expand description
Two-argument ReLU: output[i] = max(0, input[i]).
Avoids the clone+in-place pattern by reading from input and writing to
output in a single pass, halving memory traffic.