1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
//! Distributed RDMA network — ranked multi-peer setup with barrier synchronization and an out-of-band TCP exchanger for cluster bootstrapping.
//!
//! A [`Node`] combines a [`MultiChannel`] with a rank, a world size, and a
//! [`Barrier`] to form a complete building block for distributed RDMA programs.
//! It exposes the full [`multi_channel`](crate::multi_channel) operation API
//! (scatter/gather sends, writes, reads, multicast) and adds barrier synchronization
//! for coordinating execution across all nodes in the network.
//!
//! # Connection lifecycle
//!
//! Connecting a set of nodes requires exchanging endpoint information between every
//! pair of participants. The [`tcp_exchanger`](Exchanger) utility performs this
//! out-of-band exchange over TCP, driven by a shared [`RawNetworkConfig`] that
//! describes the address and port of each node.
//!
//! 1. **Build** — call [`Node::builder`] (or [`ProtectionDomain::create_node`]) and
//! set at minimum `rank`, `world_size`, and `pd`. An optional
//! [`BarrierAlgorithm`] can be chosen; the default is
//! [`BinaryTree`](BarrierAlgorithm::BinaryTree).
//! 2. **Exchange endpoints** — call [`Node::endpoint`](PreparedNode::endpoint) to
//! obtain the local [`LocalEndpoint`], then use
//! [`Exchanger::await_exchange_all`] to distribute it to all peers and collect
//! theirs. Pass the result through [`Node::gather_endpoints`](PreparedNode::gather_endpoints)
//! to produce [`RemoteEndpoints`] in the format expected by the handshake.
//! 3. **Handshake** — call [`PreparedNode::handshake`] with the remote endpoints to
//! bring up all queue pairs and obtain the ready-to-use [`Node`].
//!
//! # Operations
//!
//! All [`MultiChannel`] operations are forwarded directly on [`Node`]:
//! [`scatter_send`](Node::scatter_send), [`gather_receive`](Node::gather_receive),
//! [`scatter_write`](Node::scatter_write), [`gather_read`](Node::gather_read), and
//! [`multicast_send`](Node::multicast_send), along with their scoped and unpolled
//! variants via [`Node::scope`] and [`Node::manual_scope`].
//!
//! # Barrier synchronization
//!
//! [`Node::barrier`] blocks until every node in the supplied peer list has called
//! barrier, or until the timeout expires. The peer list may be any subset of the
//! world, allowing partial barriers across subgroups.
//! [`Node::barrier_unchecked`] skips peer-list validation for hot paths.
//!
//! The barrier algorithm is selected at build time via [`BarrierAlgorithm`]:
//!
//! * [`Centralized`](BarrierAlgorithm::Centralized) — the lowest-ranked participant
//! acts as coordinator; simple but does not scale well.
//! * [`Dissemination`](BarrierAlgorithm::Dissemination) — pairwise exchange at
//! exponential distances; no designated leader, scales well.
//! * [`BinaryTree`](BarrierAlgorithm::BinaryTree) — tree-based reduce and broadcast;
//! a balanced alternative to dissemination.
//!
//! # Network configuration
//!
//! [`RawNetworkConfig`] is a serializable description of the cluster (one
//! [`NodeConfig`] per rank, each with an IP address and port) that can be loaded from
//! JSON. [`RawNetworkConfig::build`] validates it and produces a [`NetworkConfig`]
//! ready for use with [`Exchanger`].
//!
//! # Example: building a node and exchanging data
//!
//! ```no_run
//! use ibverbs_rs::ibverbs;
//! use ibverbs_rs::network::{Node, ExchangeConfig, Exchanger, RawNetworkConfig};
//! use ibverbs_rs::multi_channel::PeerSendWorkRequest;
//!
//! // Load network config (see RawNetworkConfig for the JSON format)
//! let json = std::fs::read_to_string("network.json")?;
//! let config = serde_json::from_str::<RawNetworkConfig>(&json)?.build()?;
//! let rank = 0;
//!
//! let ctx = ibverbs::open_device("mlx5_0")?;
//! let pd = ctx.allocate_pd()?;
//!
//! // 1. Build
//! let prepared = Node::builder()
//! .pd(&pd)
//! .rank(rank)
//! .world_size(config.world_size())
//! .build()?;
//!
//! // 2. Exchange endpoints over TCP
//! let local_ep = prepared.endpoint();
//! let remote_eps = Exchanger::await_exchange_all(
//! rank, &config, &local_ep, &ExchangeConfig::default(),
//! )?;
//! let remote_eps = prepared.gather_endpoints(remote_eps)?;
//!
//! // 3. Handshake
//! let mut node = prepared.handshake(remote_eps)?;
//!
//! // Send data to peer 1
//! let buf = [42u8; 64];
//! let mr = node.pd().register_local_mr_slice(&buf)?;
//! node.send(PeerSendWorkRequest::new(1, &[mr.gather_element(&buf)]))?;
//! # Ok::<(), Box<dyn std::error::Error>>(())
//! ```
//!
//! See also [`examples/network.rs`](https://github.com/Tikitikitikidesuka/ibverbs-rs/blob/main/examples/network.rs)
//! for a complete multi-node runnable example.
//!
//! [`MultiChannel`]: crate::multi_channel::MultiChannel
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
use crateProtectionDomain;
use crateMultiChannel;
/// A ranked RDMA network node with barrier synchronization.
///
/// Wraps a [`MultiChannel`] with a rank, world size, and a [`Barrier`] for
/// collective synchronization across all nodes in the network.