Function burn_wgpu::kernel::reduce::sum_dim_shared_memory

source ·

pub fn sum_dim_shared_memory<E: WgpuElement, const D: usize>(
    input: WgpuTensor<E, D>,
    output: WgpuTensor<E, D>,
    dim: usize
) -> WgpuTensor<E, D>

Expand description

Execute the sum dim kernel leveraging shared memory Probably more efficient on tensors where the dimension to reduced is much larger than the others