pub fn all_reduce<T>(
transport: &T,
buf: SymmetricBuffer,
local: &mut [f32],
op: ReduceKind,
) -> Result<(), CollectiveError>where
T: SymmetricTransport,Expand description
AllReduce: every rank ends up with op({values from every rank}).
Naïve algorithm — every rank reads every other rank’s slot and combines. O(n_ranks²) communications, fine for small rank counts. Real impls use ring-reduce / tree-reduce; we pick simplicity since LocalTransport’s “comm” is memcpy.
local carries this rank’s contribution on entry; on exit
it carries the reduced result. Element count must match the
per-rank len of buf (in bytes: 4 * elements).