pub fn gpu_add_channel(a: u64, b: u64, m: u64) -> u64
Reference implementation of one GPU thread’s add: (a + b) % m.
(a + b) % m