Skip to main content

ServerCommunication

Trait ServerCommunication 

Source
pub trait ServerCommunication {
    const SERVER_COMM_ENABLED: bool;

    // Provided methods
    fn sync_collective(
        &mut self,
        stream_id: StreamId,
    ) -> Result<(), ServerError> { ... }
    fn comm_init(
        &mut self,
        device_ids: Vec<DeviceId>,
    ) -> Result<(), ServerError> { ... }
    fn all_reduce(
        &mut self,
        src: Binding,
        dst: Binding,
        dtype: ElemType,
        stream_id: StreamId,
        op: ReduceOperation,
        device_ids: Vec<DeviceId>,
    ) -> Result<(), ServerError> { ... }
    fn send(
        &mut self,
        desc: CopyDescriptor,
        dtype: ElemType,
        stream_id: StreamId,
        device_id_dst: DeviceId,
    ) -> Result<(), ServerError> { ... }
    fn recv(
        &mut self,
        handle: Handle,
        dtype: ElemType,
        stream_id: StreamId,
        device_id_src: DeviceId,
    ) -> Result<(), ServerError> { ... }
}
Expand description

Defines functions for optimized data transfer between servers, supporting custom communication mechanisms such as peer-to-peer communication or specialized implementations.

Required Associated Constants§

Source

const SERVER_COMM_ENABLED: bool

Indicates whether server-to-server communication is enabled for this implementation.

Provided Methods§

Source

fn sync_collective(&mut self, stream_id: StreamId) -> Result<(), ServerError>

Ensure that all queued collective operations have been executed.

§Arguments
  • stream_id - The StreamId of the stream waiting for the sync.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn comm_init(&mut self, device_ids: Vec<DeviceId>) -> Result<(), ServerError>

Initialize the communication between the devices in device_ids.

§Arguments
  • device_ids - The IDs of the devices that need communication.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn all_reduce( &mut self, src: Binding, dst: Binding, dtype: ElemType, stream_id: StreamId, op: ReduceOperation, device_ids: Vec<DeviceId>, ) -> Result<(), ServerError>

Performs an all_reduce operation on the input data and writes it to the output buffer. see https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#allreduce

§Arguments
  • src - The data to be reduced.
  • dst - Where to write the result.
  • dtype - The element type of the data being reduced
  • stream_id - The data’s stream id.
  • op - The reduce’s aggregation operation e.g. mean, sum, etc.
  • device_ids - The list of device ids from which to all_reduce.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn send( &mut self, desc: CopyDescriptor, dtype: ElemType, stream_id: StreamId, device_id_dst: DeviceId, ) -> Result<(), ServerError>

Sends data from this server to a destination server.

§Arguments
  • desc - A descriptor specifying the data to be sent, including shape, strides, and binding.
  • dtype - The element type of the data being sent.
  • stream_id - The stream ID associated with the server’s operation.
  • device_id_dst - ID of the device receiving the data.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn recv( &mut self, handle: Handle, dtype: ElemType, stream_id: StreamId, device_id_src: DeviceId, ) -> Result<(), ServerError>

Receive data from another server.

§Arguments
  • handle - The handle in which the received data is written.
  • dtype - The element type of the data being sent.
  • stream_id - The stream ID associated with the server’s operation.
  • device_id_src - ID of the device sending the data.
§Returns

Returns a Result containing an ServerError if the operation fails.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§