1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
//! Distributed training for ferrotorch.
//!
//! This crate provides the building blocks for multi-rank training:
//!
//! - **Backends** ([`backend`]) — Transport-agnostic communication.
//! [`TcpBackend`](backend::TcpBackend) for real multi-process training,
//! [`SimulatedBackend`](backend::SimulatedBackend) for in-process testing.
//!
//! - **Collectives** ([`collective`]) — [`allreduce`](collective::allreduce),
//! [`broadcast`](collective::broadcast), and [`barrier`](collective::barrier).
//!
//! - **DDP** ([`ddp`]) — [`DDP`](ddp::DDP) wraps a `Module` and
//! synchronizes gradients across ranks after each backward pass.
//!
//! - **GPU collectives** ([`gpu_collective`], requires `gpu` feature) —
//! [`gpu_allreduce`](gpu_collective::gpu_allreduce) and
//! [`gpu_broadcast`](gpu_collective::gpu_broadcast) transfer GPU tensors
//! to CPU, run the standard TCP collective, and copy back. Portable
//! alternative to NCCL.
//!
//! # Quick start
//!
//! ```ignore
//! use ferrotorch_distributed::backend::SimulatedBackend;
//! use ferrotorch_distributed::collective::{allreduce, ReduceOp};
//! use ferrotorch_distributed::ddp::DDP;
//! ```
// Re-export key types at crate root for convenience.
pub use ;
pub use ;
pub use DDP;
pub use DistributedError;
pub use ;