pub fn sigmoid_kernel(x: &[f32], y: &[f32], gamma: f32, coef0: f32) -> f32
SIMD-optimized sigmoid kernel function