1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
//! Distributed training for ferrotorch.
//!
//! This crate provides the building blocks for multi-rank training:
//!
//! - **Backends** ([`backend`]) — Transport-agnostic communication.
//! [`TcpBackend`](backend::TcpBackend) for real multi-process training,
//! [`SimulatedBackend`](backend::SimulatedBackend) for in-process testing.
//!
//! - **Collectives** ([`collective`]) — [`allreduce`](collective::allreduce),
//! [`all_gather`](collective::all_gather),
//! [`reduce_scatter`](collective::reduce_scatter),
//! [`broadcast`](collective::broadcast), and [`barrier`](collective::barrier).
//!
//! - **DDP** ([`ddp`]) — [`DDP`](ddp::DDP) wraps a `Module` and
//! synchronizes gradients across ranks after each backward pass.
//!
//! - **FSDP** ([`fsdp`]) — [`FSDP`](fsdp::FSDP) wraps a `Module` and
//! shards parameters across ranks, all-gathering during forward and
//! reduce-scattering gradients during backward.
//!
//! - **RPC** ([`rpc`]) — Remote Procedure Call framework with
//! [`RpcContext`](rpc::RpcContext) for invoking functions on remote ranks,
//! and [`RRef`](rpc::RRef) for holding references to remote data.
//!
//! - **Pipeline parallelism** ([`pipeline`]) —
//! [`Pipeline`](pipeline::Pipeline) splits a model into sequential stages
//! and processes microbatches through them. Supports
//! [`GPipe`](pipeline::PipelineSchedule::GPipe) and
//! [`Interleaved1F1B`](pipeline::PipelineSchedule::Interleaved1F1B) schedules.
//!
//! - **GPU collectives** ([`gpu_collective`], requires `gpu` feature) —
//! [`gpu_allreduce`](gpu_collective::gpu_allreduce) and
//! [`gpu_broadcast`](gpu_collective::gpu_broadcast) transfer GPU tensors
//! to CPU, run the standard TCP collective, and copy back. Portable
//! alternative to NCCL.
//!
//! # Quick start
//!
//! ```ignore
//! use ferrotorch_distributed::backend::SimulatedBackend;
//! use ferrotorch_distributed::collective::{allreduce, ReduceOp};
//! use ferrotorch_distributed::ddp::DDP;
//! use ferrotorch_distributed::fsdp::FSDP;
//! use ferrotorch_distributed::rpc::{RpcContext, SimulatedRpcBackend};
//! use ferrotorch_distributed::pipeline::{Pipeline, PipelineStage, PipelineSchedule};
//! ```
// Re-export key types at crate root for convenience.
pub use ;
pub use ;
pub use ;
pub use DDP;
pub use DistributedError;
pub use FSDP;
pub use ;
pub use ;
pub use ;