Skip to main content

Module kernels

Module kernels

Expand description

GPU compute kernels (WGSL shaders)

Parallel Reduction Algorithm (Harris 2007):

Each thread loads one element
Workgroup-local reduction using shared memory
Global reduction of workgroup results

Performance: O(N/P + log P) where P = num threads

Functions§

count: Execute COUNT aggregation on GPU Trivial implementation - just returns array length
max_i32: Execute MAX aggregation on GPU (i32)
min_i32: Execute MIN aggregation on GPU (i32)
sum_f32: Execute SUM aggregation on GPU (f32) Placeholder - not yet implemented
sum_i32: Execute SUM aggregation on GPU (i32)