pub fn rbf_kernel(x: &[f32], y: &[f32], gamma: f32) -> f32
SIMD-optimized RBF (Gaussian) kernel function