pub struct TcpTransport { /* private fields */ }Expand description
TCP-based transport for multi-node gradient exchange.
One node acts as coordinator (rank 0, TCP server) while all other
nodes act as workers (TCP clients). The coordinator calls
TcpTransport::coordinator which binds to a socket and blocks until
world_size - 1 workers have connected. Each worker calls
TcpTransport::worker with the coordinator’s address and its own rank.
§Wire protocol
Every message is a length-prefixed f32 array:
- 4-byte little-endian
u32element count. count * 4bytes of little-endianf32values.
§Rank coordination
Workers announce their rank as a 4-byte LE u32 immediately after the
TCP handshake so the coordinator can place each connection in the correct
slot.
§Usage
For single-machine testing use loopback_pair which creates a
coordinator + worker pair over 127.0.0.1 on a random port.
Implementations§
Source§impl TcpTransport
impl TcpTransport
Sourcepub fn coordinator(
bind_addr: &str,
world_size: usize,
) -> Result<Self, ModelError>
pub fn coordinator( bind_addr: &str, world_size: usize, ) -> Result<Self, ModelError>
Create a coordinator node that listens for worker connections.
Blocks until world_size - 1 workers have connected.
The coordinator is always rank 0.
Sourcepub fn worker(coordinator_addr: &str, rank: usize) -> Result<Self, ModelError>
pub fn worker(coordinator_addr: &str, rank: usize) -> Result<Self, ModelError>
Create a worker node that connects to the coordinator.
Sourcepub fn send(&self, peer: usize, data: &[f32]) -> Result<(), ModelError>
pub fn send(&self, peer: usize, data: &[f32]) -> Result<(), ModelError>
Send a tensor’s data to a specific peer.
peer is an index into the peers list (0-based).
Sourcepub fn recv(&self, peer: usize) -> Result<Vec<f32>, ModelError>
pub fn recv(&self, peer: usize) -> Result<Vec<f32>, ModelError>
Receive tensor data from a specific peer.
peer is an index into the peers list (0-based).
Sourcepub fn allreduce_sum(&self, data: &mut [f32]) -> Result<(), ModelError>
pub fn allreduce_sum(&self, data: &mut [f32]) -> Result<(), ModelError>
All-reduce: sum gradients across all nodes.
Implements a simple butterfly all-reduce pattern: the coordinator collects data from all workers, computes the element-wise sum, and broadcasts the result back.
Sourcepub fn world_size(&self) -> usize
pub fn world_size(&self) -> usize
Returns the world size (total number of nodes).
Auto Trait Implementations§
impl Freeze for TcpTransport
impl RefUnwindSafe for TcpTransport
impl Send for TcpTransport
impl Sync for TcpTransport
impl Unpin for TcpTransport
impl UnsafeUnpin for TcpTransport
impl UnwindSafe for TcpTransport
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more