pub fn balanced_partition(
rt: &GpuRuntime,
n_units: usize,
) -> Vec<(usize, Range<usize>)>Expand description
Partition n_units independent work items across all usable devices,
weighted by GpuDeviceInfo::score.
Returns (ordinal, Range) tiles that exactly cover 0..n_units with no gaps
or overlaps, largest-score device first. A single device yields one full-span
tile. n_units == 0 or no GPU yields an empty Vec.
Allocation is largest-remainder by score: each device’s ideal share is
score_i / Σscore · n_units; floors are assigned first, then the remaining
units (from rounding) go to the devices with the largest fractional parts.
This keeps the split proportional to capability while guaranteeing the tiles
tile the whole range. Devices that round to a zero-width tile are dropped so
no worker is spawned for empty work.