pub fn gpu_prefix_sum(counts: &[usize]) -> Vec<usize>
Exclusive prefix sum (scan) of counts.
counts
For input [a, b, c, d] the output is [0, a, a+b, a+b+c]. This mirrors what a parallel GPU prefix-sum kernel would produce.
[a, b, c, d]
[0, a, a+b, a+b+c]