pub fn reduce_compute_shader(op: ReduceOp) -> Vec<u32>Expand description
Generate an OpenCL SPIR-V compute kernel for reduction along an axis.
Kernel parameters: (CrossWorkgroup float* input, CrossWorkgroup float* output, uint outer_size, uint reduce_size, uint inner_size).
Each thread computes one output element by iterating over the reduce dimension.