wgpu-algorithms
A high-performance, safe WebGPU sorting and scanning library for Rust.
Safe Rust API. All memory management, bind groups, and synchronization are handled internally. No unsafe blocks in library code.
Performance Benchmarks
Benchmarks run on Apple M3 Max (Metal backend).
- CPU: Rayon Parallel Sort (
par_sort_unstable) - GPU Resident: Sorts data already on VRAM (Pipeline use-case)
- GPU Round-Trip: Upload -> Sort -> Download (Utility use-case)
| Items | CPU (Rayon) | GPU (Resident) | GPU (Round-Trip) | Verdict |
|---|---|---|---|---|
| 100k | 0.52 ms | 6.0 ms | 7.2 ms | ❌ CPU Wins (Driver Overhead) |
| 1M | 4.5 ms | 9.1 ms | 10.1 ms | ❌ CPU Wins |
| 10M | 44.1 ms | 31.3 ms | 40.9 ms | ✅ GPU Wins (1.4x) |
| 100M | 506 ms | 273 ms | 407 ms | 🚀 GPU Domination (1.85x) |
Throughput (100M items):
- Scan: ~5.2 Billion items/sec
- Sort: ~365 Million items/sec
Prefix Scan (Inclusive Add)
Benchmarks include driver submission overhead (queue.submit + device.poll).
| Items | Time | Throughput | Bandwidth (Effective) |
|---|---|---|---|
| 100 M | 19.2 ms | 5.2 Gelem/s | ~41.6 GB/s |
Note: Bandwidth calculated as Read + Write (4 bytes * 2 * items / time).
Architecture
The library implements state-of-the-art parallel algorithms tailored for the WebGPU execution model:
- LSD Radix Sort: A 2-bit pass (4 bins) decoupling "Counting" and "Scattering" kernels.
- Hierarchical Scan: A "Reduce-Then-Scan" approach using 3 separate kernels (Downsweep, Scan-Aux, Upsweep) to handle arbitrary input sizes.
- Vector Tiling (VT): Automatically adjusts items-per-thread based on GPU capability (e.g.,
VT=8for Desktop,VT=4for Mobile) to saturate memory bandwidth.
Features
- Adaptive Sorting: Automatically switches between CPU (latency-optimized) and GPU (throughput-optimized) based on input size (< 1M items uses CPU).
- Zero-Allocation Hot Loop: Reuses internal workspace buffers and pre-baked BindGroups to minimize driver pressure during animation loops.
- WGPU Safe: Runs on Metal, Vulkan, DX12, and WebGPU without experimental features.
Usage
use ;
async
Installation
Add this to your Cargo.toml:
[dependencies]
wgpu-algorithms = "0.1.0"