Skip to main content

Module reduce

Module reduce 

Source
Expand description

High-performance CPU memory reduction kernels.

Reductions are bandwidth-bound (read N elements, write 1 scalar). Single-threaded loops with LLVM auto-vectorization already saturate the memory bus. Rayon is not used.

Functionsยง

max_all
Executes a global max reduction, finding the largest single value in the tensor.
mean_all
Executes a global mean reduction, calculating the average of all elements.
sum_all
Executes a global sum reduction, collapsing the entire tensor into a single scalar value.