ferrotorch-distributed
Distributed training for ferrotorch -- backends, collectives, and DDP.
What it provides
- Backends --
TcpBackendfor real multi-process training,SimulatedBackendfor in-process testing, and theBackendtrait - Collectives --
allreduce,broadcast,barrierwithReduceOp(Sum, Mean, Min, Max) - DDP --
DDPwraps anyModuleand synchronizes gradients across ranks after each backward pass - GPU collectives (requires
gpufeature) --gpu_allreduce,gpu_broadcastfor GPU tensor communication
Feature flags
| Feature | Default | Description |
|---|---|---|
gpu |
no | Enable GPU-aware collectives via ferrotorch-gpu |
Quick start
use ;
let backend = init?;
let mut ddp_model = DDPnew?;
// Training loop -- gradients are synchronized automatically
let loss = ddp_model.forward?;
backward?;
allreduce?;
Part of ferrotorch
This crate is one component of the ferrotorch workspace. See the workspace README for full documentation.
License
MIT OR Apache-2.0