pub fn reduce_tree<P: ReducePrecision, Inst: ReduceInstruction<P>>(
inst: &Inst,
accumulator: &mut Inst::SharedAccumulator,
size: u32,
) -> Inst::AccumulatorItem
Expand description
Use all units within a cube to fuse the first size
elements of accumulator
inplace like this with some padding if size
is not a power of 2.
0 1 2 3 4 5 6 7
| | | | | | | |
+---+ +---+ +---+ +---+
| | | |
+-------+ +-------+
| |
+---------------+
|
*
The outcome is stored in the first element of the accumulator and also returned by this function for convenience.
Since each individual cube performs a reduction, this function is meant to be called
with a different accumulator
for each cube based on CUBE_POS
.
There is no out-of-bound check, so it is the responsibility of the caller to ensure that size
is at most the length
of the shared memory and that there are at least size
units within each cube.