pub fn normalize(
input: &Tensor,
mean: &[f32],
std: &[f32],
) -> Result<Tensor, ImgProcError>Expand description
Normalize image channels: (pixel - mean) / std. Input: [H, W, C] f32.
mean and std slices must have length equal to C.
Per-channel normalize in HWC layout: (x - mean[c]) / std[c].
Optimized: precomputes inv_std = 1/std to replace division with multiply,
and iterates by pixel (row-major) to avoid per-element modulo.
Uses SIMD (NEON/AVX/SSE) where available for all channel counts.