pub fn scatter_batched<T: Send>(
rt: &GpuRuntime,
items: &mut [T],
f: impl Fn(usize, &mut [T]) -> Option<()> + Sync,
) -> Option<()>Expand description
Run independent work across ALL devices concurrently.
Splits items via balanced_partition; each tile runs on its own
std::thread::scope thread that binds that ordinal’s context
(cuda_context_for(ordinal).bind_to_thread()) before calling
f(ordinal, &mut items[range]). Returns Some(()) only if EVERY tile’s
closure returned Some(()); if any tile fails, panics, or a context cannot
be bound, returns None so the caller can run its deterministic whole-batch
CPU fallback over the (still untouched-by-a-successful-result) items.
Non-linux builds have no CUDA contexts to bind and so always return None.