pub fn atomic_scatter_add(dst: &mut [f64], src: &[f64], indices: &[usize])
Atomic-add scatter (simulated serially): dst[idx] += value.
dst[idx] += value
In a real GPU kernel this would use atomicAdd.
atomicAdd