ferrotorch-distributed
Distributed training for ferrotorch -- backends, collectives, and DDP.
What it provides
- Backends --
TcpBackendfor real multi-process training,SimulatedBackendfor in-process testing, theBackendtrait, plus optional native-RustGlooBackend/MpiBackend(feature-gated) and anNcclBackend(requiresncclfeature) - Collectives --
allreduce,all_gather,reduce_scatter,all_to_all,broadcast,barrierwithReduceOp(Sum, Mean) - DDP --
DDPwraps anyModuleand synchronizes gradients across ranks after each backward pass - FSDP --
FSDPshards parameters across ranks, all-gathering during forward and reduce-scattering gradients during backward - RPC --
RpcAgent/TcpRpcBackendfor invoking functions on remote ranks - Pipeline parallelism --
Pipelinesplits a model into sequential stages with GPipe / Interleaved1F1B schedules - GPU collectives (requires
gpufeature) --gpu_allreduce,gpu_broadcastfor GPU tensor communication
Feature flags
| Feature | Default | Description |
|---|---|---|
gpu |
no | Enable GPU-aware collectives via ferrotorch-gpu |
Quick start
use ;
let backend = init?;
let mut ddp_model = DDPnew?;
// Training loop -- gradients are synchronized automatically
let loss = ddp_model.forward?;
backward?;
allreduce?;
Part of ferrotorch
This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.
License
MIT OR Apache-2.0