Expand description
Safe Rust wrappers for the NVIDIA NVSHMEM host API.
NVSHMEM is the OpenSHMEM symmetric-heap model on GPUs: every PE
(processing element — typically one GPU) allocates from a symmetric
heap at a shared virtual address, and any PE can read/write another PE’s
heap directly via one-sided put / get. This is the fine-grained,
one-sided complement to baracuda-nccl’s collectives — the two coexist
and a single program may use both.
§What this crate covers (Tier 1)
Context— process-wide NVSHMEM lifetime (init / finalize) plus cachedmy_pe/n_pes, and the barrier / quiet / fence ordering primitives.Team— a subset of PEs created via strided split.SymmetricBuffer— a typed allocation on the symmetric heap.- Host-initiated RMA — blocking and stream-ordered
Context::put/Context::get.
§What it does not cover
The device-side API — the __device__ nvshmem_int_p /
nvshmem_putmem_nbi calls issued from inside a CUDA kernel — requires
linking libnvshmem_device.a into the consumer’s kernel binary and is
out of scope (it cannot be a lazily-loaded host symbol). A consumer that
needs device-side NVSHMEM writes its own .cu that includes the NVSHMEM
headers and links the device archive.
§Availability
NVSHMEM is a Linux library requiring compute capability sm_70+ (every
baracuda-supported GPU qualifies). On hosts without the NVSHMEM runtime
installed, Context::init returns LoaderError::LibraryNotFound, so
callers can fall back to single-process execution.
Structs§
- Context
- The process-wide NVSHMEM runtime, from this PE’s point of view.
- Symmetric
Buffer - A typed allocation on the NVSHMEM symmetric heap. The same virtual address
is valid on every PE, so the pointer can be used as a remote address in
Context::put/Context::get. Freed onDropvianvshmem_free. - Team
- A team — a named subset of PEs. Teams created via
Team::split_stridedmust be released withTeam::destroy; the predefinedTeam::WORLD/Team::SHAREDmust not be destroyed. - Unique
Id - A 128-byte identifier for the unique-id bootstrap (the NVSHMEM analogue of
NCCL’s
UniqueId). One PE callsUniqueId::newand distributes the bytes to every other PE; each then feeds it to the rawnvshmemx_set_attr_uniqueid_args+Context::init_with_attrpath.
Functions§
- version
- Convenience: NVSHMEM library version as
(major, minor)without holding aContext. Useful for capability probes. Errors if NVSHMEM is not installed.