pub fn tree_reduce_sum(data: &[f64]) -> f64Expand description
Work-efficient tree reduction: sums data using a binary tree pattern.
This simulates the GPU tree-reduction kernel where each thread handles one element and the active thread count halves each step.