1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
//! Hybrid TCP+NCCL backend for distributed training.
//!
//! [`HybridBackend`] combines a [`TcpBackend`] for point-to-point
//! communication (send/recv/barrier) with an [`NcclBackend`] for
//! GPU-native collective operations (allreduce, broadcast, etc.).
//!
//! This matches PyTorch's `ProcessGroupNCCL` behavior where NCCL handles
//! collectives and Gloo/TCP handles P2P fallback.
//!
//! # Feature gate
//!
//! Requires the `nccl` feature.
//!
//! ## REQ status (per `.design/ferrotorch-distributed/hybrid_backend.md`)
//!
//! | REQ | Status | Evidence |
//! |---|---|---|
//! | REQ-1 (HybridBackend struct) | SHIPPED | `pub struct HybridBackend { tcp: TcpBackend, nccl: NcclBackend }` in `hybrid_backend.rs`; consumer `pub use hybrid_backend::HybridBackend` at `lib.rs` under `#[cfg(feature = "nccl")]`. |
//! | REQ-2 (constructor order TCP→NCCL) | SHIPPED | `pub fn new` builds `TcpBackend::new(...)` first then `NcclBackend::new(...)` in `hybrid_backend.rs`; consumer via crate-root re-export reachable from `ferrotorch/src/lib.rs`. |
//! | REQ-3 (nccl() / tcp() accessors) | SHIPPED | `pub fn nccl(&self) -> &NcclBackend` and `pub fn tcp(&self) -> &TcpBackend` in `hybrid_backend.rs`; documented production call shape `nccl_allreduce(&mut buf, hybrid.nccl(), ...)`. |
//! | REQ-4 (synchronize_nccl) | SHIPPED | `pub fn synchronize_nccl(&self)` in `hybrid_backend.rs` forwards to `NcclBackend::synchronize`; consumer via crate-root re-export. |
//! | REQ-5 (Backend trait delegation to tcp) | SHIPPED | `impl Backend for HybridBackend` in `hybrid_backend.rs` delegates all six methods to `self.tcp`; consumer is every `&dyn Backend`-accepting fn in `crate::collective::*` and `crate::p2p::*`. |
use Duration;
use FerrotorchResult;
use crate;
use crateNcclBackend;
use crateNcclUniqueId;
/// Hybrid backend combining TCP for P2P and NCCL for GPU collectives.
///
/// Use the [`Backend`] trait methods for P2P (delegated to TCP), and
/// access the inner [`NcclBackend`] via [`nccl()`](Self::nccl) for
/// GPU-native collective operations.
///
/// # Example
///
/// ```ignore
/// let hybrid = HybridBackend::new(rank, world_size, addr, unique_id)?;
///
/// // P2P via TCP
/// hybrid.send(&data, dst_rank)?;
///
/// // GPU collectives via NCCL
/// nccl_allreduce(&mut gpu_buffer, hybrid.nccl(), &ReduceOp::Sum)?;
/// ```