Crate cubecl_reduce

Expand description

This provides different implementations of the reduce algorithm which can run on multiple GPU backends using CubeCL.

A reduction is a tensor operation mapping a rank R tensor to a rank R - 1 by agglomerating all elements along a given axis with some binary operator. This is often also called folding.

This crate provides a main entrypoint as the reduce function which allows to automatically perform a reduction for a given instruction implementing the ReduceInstruction trait and a given ReduceStrategy. It also provides implementation of the ReduceInstruction trait for common operations in the instructions module. Finally, it provides many reusable primitives to perform different general reduction algorithms in the primitives module.

Re-exports§

pub use instructions::ReduceFamily;
pub use instructions::ReduceInstruction;
pub use args::init_tensors;
pub use args::init_tensors;

Modules§

args
instructions
primitives
reduce_kernel
reduce_kernel_virtual
tune_key

Structs§

ReduceConfig
ReduceParams
ReduceStrategy

Enums§

BoundChecksInner: How bound checks is handled for inner reductions.
LineMode
ReduceError

Traits§

ReducePrecision: Precision used for the reduction.

Functions§

reduce: Reduce the given axis of the input tensor using the instruction Inst and write the result into output.
reduce_kernel
reduce_kernel_virtual
shared_sum: Sum all the elements of the input tensor distributed over cube_count cubes.