pub fn reduce_tree<P: ReducePrecision, Inst: ReduceInstruction<P>>(
inst: &Inst,
accumulator: &mut Inst::SharedAccumulator,
size: u32,
) -> Inst::AccumulatorItemExpand description
Use all units within a cube to fuse the first size elements of accumulator inplace like this with some padding if size is not a power of 2.
0 1 2 3 4 5 6 7
| | | | | | | |
+---+ +---+ +---+ +---+
| | | |
+-------+ +-------+
| |
+---------------+
|
*
The outcome is stored in the first element of the accumulator and also returned by this function for convenience.
Since each individual cube performs a reduction, this function is meant to be called
with a different accumulator for each cube based on CUBE_POS.
There is no out-of-bound check, so it is the responsibility of the caller to ensure that size is at most the length
of the shared memory and that there are at least size units within each cube.