Function burn_wgpu::kernel::reduce::sum_dim_shared_memory
source · pub fn sum_dim_shared_memory<E: WgpuElement, const D: usize>(
input: WgpuTensor<E, D>,
output: WgpuTensor<E, D>,
dim: usize
) -> WgpuTensor<E, D>Expand description
Execute the sum dim kernel leveraging shared memory Probably more efficient on tensors where the dimension to reduced is much larger than the others