pub enum ApplyPolicy {
Sync,
Cadence,
Async,
}Expand description
Controls WHEN parameter averaging occurs (the interval K).
All three modes run the same architecture; only the averaging trigger differs. The interval K determines how many batches each GPU processes with its own local optimizer before parameters are synchronized across replicas.
Sync: K=1 (every batch). Equivalent to standard DDP. Best convergence guarantees, but fast GPUs idle waiting for slow ones.Cadence: K=N (ElChe anchor count). The slow GPU anchors the cadence, fast GPUs fill the wall time with extra batches. Recommended for heterogeneous hardware (e.g. mixing GPU generations).Async: same proportional scheduling as Cadence (ElChe batch counts), but with divergence correction: if replicas drift apart, the anchor is nudged down (tighter sync). Differs from Cadence only in epoch dispatch (per-rank vs broadcast) in non-progressive mode.
Variants§
Sync
Average after every batch (K=1). Equivalent to standard synchronous DDP. Lowest risk of model divergence. Fast GPUs wait at the collective barrier.
Cadence
Average every N batches where N is determined by ElChe’s cadence strategy. The slow device sets the pace; fast devices process proportionally more batches per averaging window. Good default for mixed GPU setups.
Async
Same proportional scheduling as Cadence, plus divergence correction: if parameter norms drift apart, ElChe’s anchor is nudged down (tighter sync). Differs from Cadence only in epoch dispatch (per-rank in non-progressive, identical in progressive mode).
Trait Implementations§
Source§impl Clone for ApplyPolicy
impl Clone for ApplyPolicy
Source§fn clone(&self) -> ApplyPolicy
fn clone(&self) -> ApplyPolicy
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more