Skip to main content

Module kernels

Module kernels 

Source
Expand description

GPU compute kernels (WGSL shaders)

Parallel Reduction Algorithm (Harris 2007):

  1. Each thread loads one element
  2. Workgroup-local reduction using shared memory
  3. Global reduction of workgroup results

Performance: O(N/P + log P) where P = num threads

Functionsยง

count
Execute COUNT aggregation on GPU Trivial implementation - just returns array length
max_i32
Execute MAX aggregation on GPU (i32)
min_i32
Execute MIN aggregation on GPU (i32)
sum_f32
Execute SUM aggregation on GPU (f32) Placeholder - not yet implemented
sum_i32
Execute SUM aggregation on GPU (i32)