pub fn tanh_slice_dispatch(input: &[f32], output: &mut [f32])
Fast tanh applied element-wise: output[i] = tanh(input[i]).
output[i] = tanh(input[i])
Computed as 2 * sigmoid(2x) - 1.
2 * sigmoid(2x) - 1