Function reduce_tree

Source
pub fn reduce_tree<P: ReducePrecision, Inst: ReduceInstruction<P>>(
    inst: &Inst,
    accumulator: &mut Inst::SharedAccumulator,
    size: u32,
) -> Inst::AccumulatorItem
Expand description

Use all units within a cube to fuse the first size elements of accumulator inplace like this with some padding if size is not a power of 2.


    0   1   2   3   4   5   6   7
    |   |   |   |   |   |   |   |
    +---+   +---+   +---+   +---+
    |       |       |       |
    +-------+       +-------+
    |               |
    +---------------+
    |
    *

The outcome is stored in the first element of the accumulator and also returned by this function for convenience.

Since each individual cube performs a reduction, this function is meant to be called with a different accumulator for each cube based on CUBE_POS.

There is no out-of-bound check, so it is the responsibility of the caller to ensure that size is at most the length of the shared memory and that there are at least size units within each cube.