Skip to main content

all_reduce

Function all_reduce 

Source
pub fn all_reduce<T>(
    transport: &T,
    buf: SymmetricBuffer,
    local: &mut [f32],
    op: ReduceKind,
) -> Result<(), CollectiveError>
Expand description

AllReduce: every rank ends up with op({values from every rank}).

Naïve algorithm — every rank reads every other rank’s slot and combines. O(n_ranks²) communications, fine for small rank counts. Real impls use ring-reduce / tree-reduce; we pick simplicity since LocalTransport’s “comm” is memcpy.

local carries this rank’s contribution on entry; on exit it carries the reduced result. Element count must match the per-rank len of buf (in bytes: 4 * elements).