pub const GPU_DISPATCH_THRESHOLD: usize = 256;
Threshold for automatic GPU dispatch. Below this count, CPU is often faster due to GPU overhead.