pub fn gpu_dot(a: &[f64], b: &[f64]) -> f64
Compute the dot product of two equal-length slices (parallel mock).
Panics in debug mode if the slices differ in length.