Skip to main content

ServerCommunication

Trait ServerCommunication 

Source
pub trait ServerCommunication {
    const SERVER_COMM_ENABLED: bool;

    // Provided methods
    fn sync_collective(
        &mut self,
        stream_id: StreamId,
    ) -> Result<(), ServerError> { ... }
    fn all_reduce(
        &mut self,
        src: Binding,
        dst: Binding,
        dtype: ElemType,
        stream_id: StreamId,
        op: ReduceOperation,
        device_ids: Vec<DeviceId>,
    ) -> Result<(), ServerError> { ... }
    fn copy(
        handle_dst: Handle,
        server_src: &mut Self,
        server_dst: &mut Self,
        src: CopyDescriptor,
        stream_id_src: StreamId,
        stream_id_dst: StreamId,
    ) -> Result<(), ServerError> { ... }
}
Expand description

Defines functions for optimized data transfer between servers, supporting custom communication mechanisms such as peer-to-peer communication or specialized implementations.

Required Associated Constants§

Source

const SERVER_COMM_ENABLED: bool

Indicates whether server-to-server communication is enabled for this implementation.

Provided Methods§

Source

fn sync_collective(&mut self, stream_id: StreamId) -> Result<(), ServerError>

Ensure that all queued collective operations have been executed.

§Arguments
  • stream_id - The StreamId of the stream waiting for the sync.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn all_reduce( &mut self, src: Binding, dst: Binding, dtype: ElemType, stream_id: StreamId, op: ReduceOperation, device_ids: Vec<DeviceId>, ) -> Result<(), ServerError>

Performs an all_reduce operation on the input data and writes it to the output buffer. see https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html#allreduce

§Arguments
  • src - The data to be reduced.
  • dst - Where to write the result.
  • stream_id - The data’s stream id.
  • op - The reduce’s aggregation operation e.g. mean, sum, etc.
  • device_ids - The list of device ids from which to all_reduce.
§Returns

Returns a Result containing an ServerError if the operation fails.

Source

fn copy( handle_dst: Handle, server_src: &mut Self, server_dst: &mut Self, src: CopyDescriptor, stream_id_src: StreamId, stream_id_dst: StreamId, ) -> Result<(), ServerError>

Copies data from a source server to a destination server.

§Arguments
  • server_src - A mutable reference to the source server from which data is copied.
  • server_dst - A mutable reference to the destination server receiving the data.
  • src - A descriptor specifying the data to be copied, including shape, strides, and binding.
  • stream_id_src - The stream ID associated with the source server’s operation.
  • stream_id_dst - The stream ID associated with the destination server’s operation.
§Returns

Returns a Result containing an Allocation on success, or an IoError if the operation fails.

§Panics

Panics if server communication is not enabled (SERVER_COMM_ENABLED is false) or if the trait is incorrectly implemented by the server.

Dyn Compatibility§

This trait is not dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety", so this trait is not object safe.

Implementors§