Skip to main content

scatter_batched

Function scatter_batched 

Source
pub fn scatter_batched<T: Send>(
    rt: &GpuRuntime,
    items: &mut [T],
    f: impl Fn(usize, &mut [T]) -> Option<()> + Sync,
) -> Option<()>
Expand description

Run independent work across ALL devices concurrently.

Splits items via balanced_partition; each tile runs on its own std::thread::scope thread that binds that ordinal’s context (cuda_context_for(ordinal).bind_to_thread()) before calling f(ordinal, &mut items[range]). Returns Some(()) only if EVERY tile’s closure returned Some(()); if any tile fails, panics, or a context cannot be bound, returns None so the caller can run its deterministic whole-batch CPU fallback over the (still untouched-by-a-successful-result) items.

Non-linux builds have no CUDA contexts to bind and so always return None.