#[unsafe(no_mangle)]pub extern "C" fn sigil_gpu_reduce(arr_ptr: i64) -> i64
GPU reduce operation - uses tree reduction in shared memory Achieves O(log n) parallel steps