Expand description
Safe Rust wrappers for NVIDIA NCCL.
v0.1 covers the communicator (single-process multi-GPU via
ncclCommInitAll, multi-process via ncclCommInitRank + UniqueId) and
the all_reduce + broadcast collectives — enough for synchronous
data-parallel training.
NCCL is a Linux library; Windows has experimental support but no
general distribution. On hosts without NCCL, Communicator::init_all
returns LoaderError::LibraryNotFound — callers can fall back to
single-device execution.
Structs§
- Communicator
- A NCCL communicator — one rank’s view of a distributed group.
- NcclMem
- NCCL-managed device allocation. Drop calls
ncclMemFree. - Unique
Id - A 128-byte opaque identifier for establishing a multi-process NCCL
communicator. One process calls
UniqueId::newand distributes the bytes to all other processes via a user-provided channel (TCP, MPI, …); every process then callsCommunicator::init_rankwith the same id.
Enums§
- RedOp
- Reduction operation for
all_reduce/reduce. - Scalar
Residence - Where the scalar passed to
Communicator::create_pre_mul_sumlives.
Traits§
- Nccl
Scalar - Element type for NCCL buffers. Implemented by baracuda-types primitives via a sealed trait.
Functions§
- all_
reduce - All-reduce: each rank sends
sendand receives the per-element reduction (across every rank) intorecv. In-place use (send == recv) is legal. - broadcast
- Broadcast the data at
root’ssendbuffer to every rank’srecvbuffer. - error_
string - Human-readable name for a status code.
- group_
end - End the current collective group.
- group_
start - Begin a group of collectives that must be submitted atomically (e.g. in single-process multi-GPU all-reduce).
- version
- NCCL library version as a packed integer (e.g.
22100for NCCL 2.21.0).