pub fn gpu_dot_product(a: &[f64], b: &[f64]) -> f64
Parallel reduction dot product: a · b.
a · b
Both slices must have the same length.