Skip to main content

Crate baracuda_nccl

Crate baracuda_nccl 

Source
Expand description

Safe Rust wrappers for NVIDIA NCCL.

v0.1 covers the communicator (single-process multi-GPU via ncclCommInitAll, multi-process via ncclCommInitRank + UniqueId) and the all_reduce + broadcast collectives — enough for synchronous data-parallel training.

NCCL is a Linux library; Windows has experimental support but no general distribution. On hosts without NCCL, Communicator::init_all returns LoaderError::LibraryNotFound — callers can fall back to single-device execution.

Structs§

Communicator
A NCCL communicator — one rank’s view of a distributed group.
NcclMem
NCCL-managed device allocation. Drop calls ncclMemFree.
UniqueId
A 128-byte opaque identifier for establishing a multi-process NCCL communicator. One process calls UniqueId::new and distributes the bytes to all other processes via a user-provided channel (TCP, MPI, …); every process then calls Communicator::init_rank with the same id.

Enums§

RedOp
Reduction operation for all_reduce / reduce.
ScalarResidence
Where the scalar passed to Communicator::create_pre_mul_sum lives.

Traits§

NcclScalar
Element type for NCCL buffers. Implemented by baracuda-types primitives via a sealed trait.

Functions§

all_reduce
All-reduce: each rank sends send and receives the per-element reduction (across every rank) into recv. In-place use (send == recv) is legal.
broadcast
Broadcast the data at root’s send buffer to every rank’s recv buffer.
error_string
Human-readable name for a status code.
group_end
End the current collective group.
group_start
Begin a group of collectives that must be submitted atomically (e.g. in single-process multi-GPU all-reduce).
version
NCCL library version as a packed integer (e.g. 22100 for NCCL 2.21.0).

Type Aliases§

Error
Error type for NCCL operations.
Result
Result alias.