Module grid_sync

Expand description

Cross-grid synchronization: kernel-split fallback for backends that lack a native cooperative-launch grid barrier. Splits a Program at every Node::Barrier { ordering: GridSync } and dispatches the segments in sequence - the kernel-launch boundary itself is the grid-level fence. Grid-sync kernel splitting.

Op id: vyre-driver::grid_sync. Soundness: Exact over the cross-grid barrier contract.

§Why this lives in vyre-driver, not the backend

Every backend that lacks a native cooperative whole-grid launch needs the same kernel-split semantics for Node::Barrier { ordering: GridSync }: split the program at the barrier, dispatch each segment as its own kernel launch, and re-feed the prior segment’s outputs as inputs to the next. The kernel-launch boundary itself is the grid-level fence - every prior write becomes globally visible before the next launch reads.

Backends route through crate::grid_sync::dispatch_with_grid_sync_split when VyreBackend::supports_grid_sync is false and the program contains any Node::Barrier { ordering: GridSync }. Backends that return true emit one kernel and satisfy the barrier device-side.

§Algorithm

Walk the program’s top-level entry sequence.
Each prefix-suffix split at a Node::Barrier { GridSync } becomes one segment.
For each segment, build a Program with a segment-local buffer table: buffers read or written by that segment plus passthrough read-write buffers that must preserve caller-visible storage.
Dispatch segments in order, threading live buffers by buffer name rather than positional output slot. Segment read-only inputs are assembled from the caller’s original bytes or prior segment outputs; final host-visible output slots are reassembled in the original program’s output declaration order.

§Device-resident variant

[dispatch_with_grid_sync_split_into] round-trips every live buffer host↔device between each segment and on every fixpoint pass. For a fused multi-rule program whose shared output accumulator is hundreds of MiB and which splits into hundreds of segments, that transfer — not launch latency — dominates wall time. [dispatch_resident_grid_sync_fixpoint_into] is the device-resident counterpart: it uploads inputs into backend-resident resources once, keeps them bound across every segment and fixpoint pass (so the accumulator threads in place on-device, since resident dispatch never clears a bound buffer between launches), and reads back only the final outputs. It requires VyreBackend::supports_resident_dispatch; callers route to it on resident-capable backends and to the host split otherwise. Both paths are recall- and proof-identical (proven by a host/resident differential gate); the choice is purely a host↔device-traffic optimization.

§Soundness

Atomicity preserved: every atomic_or that fired in segment N has flushed to global memory by the time segment N+1 launches - backend launch APIs issue an implicit grid-level fence at submission boundaries.
Ordering preserved: the original program’s host-visible output is byte-identical to the un-split version, modulo timing.
No re-validation surprise: each split segment validates against the same backend supported-ops set as the original.

Functions§

contains_grid_sync: Whether program contains any Node::Barrier { ordering: GridSync } in its dispatch-level entry sequence (peeled past any synthetic outer Region).
dispatch_resident_grid_sync_fixpoint_into: Device-resident counterpart of dispatch_with_grid_sync_split_into.
dispatch_resident_with_grid_sync_split_timed: Resident-resource variant of dispatch_with_grid_sync_split_timed.
dispatch_with_grid_sync_split: Universal dispatch helper that satisfies Node::Barrier { ordering: GridSync } on any backend by splitting at the barrier and running each segment as its own kernel launch.
dispatch_with_grid_sync_split_into: Variant of dispatch_with_grid_sync_split that writes final outputs into caller-owned storage.
dispatch_with_grid_sync_split_timed: Timed variant of dispatch_with_grid_sync_split.
plan_host_grid_sync_segment_programs: Diagnostics: the host-split segment programs (post buffer-rewrite) that the fallback dispatch path (dispatch_with_grid_sync_split*) validates and launches when the backend lacks native grid-sync. Exposed so tooling and tests can inspect or validate each segment without a live backend — the raw try_split_on_grid_sync output omits the per-segment buffer access/role rewrite, so it is not what the backend actually sees.
split_on_grid_sync: Split program at every top-level Node::Barrier { GridSync }.
try_split_on_grid_sync: Fallible variant of split_on_grid_sync for production dispatch paths.

Module grid_sync

Module grid_sync Copy item path

§Why this lives in vyre-driver, not the backend

§Algorithm

§Device-resident variant

§Soundness

Functions§

Module grid_sync