pub fn gpu_sum(data: &[f64]) -> f64
Compute the sum of all elements in a slice (parallel reduction mock).
Returns 0.0 for an empty slice.
0.0