Expand description
GPU compute backend using wgpu (WebGPU)
Toyota Way Principles:
- Muda elimination: GPU only when compute > 5x transfer time
- Genchi Genbutsu: Empirical benchmarks prove 50-100x speedups
Architecture:
- WGSL compute shaders for parallel reduction
- Workgroup size: 256 threads (GPU warp size optimization)
- Two-stage reduction: workgroup-local + global
References:
HeavyDB(2017): GPU aggregation patterns- Harris (2007): Optimizing parallel reduction in CUDA
- Leis et al. (2014): Morsel-driven parallelism
Modules§
- jit
- JIT WGSL Compiler for Kernel Fusion (CORE-003)
- kernels
- GPU compute kernels (WGSL shaders)
- multigpu
- Multi-GPU data partitioning and distribution
Structs§
- GpuEngine
- GPU compute engine for aggregations