Expand description
Distributed training for AxonML — data, model, pipeline, and tensor parallelism.
DDP (DistributedDataParallel with gradient bucketing), FSDP (Fully
Sharded Data Parallel — ZeRO-2/ZeRO-3 + HybridShard + CPU offload),
Pipeline (GPipe/1F1B/Interleaved microbatch scheduling), collective ops
(all-reduce with 5 strategies, broadcast, all-gather, reduce-scatter,
gather, scatter, reduce, send/recv, barrier), ProcessGroup / World
abstraction, NcclBackend (dynamic libcudart/libnccl loading, multi-node
init via NcclUniqueId), and MockBackend (shared-state in-process
simulation for deterministic testing).
§File
crates/axonml-distributed/src/lib.rs
§Author
Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060
§Updated
April 14, 2026 11:15 PM EST
§Disclaimer
Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.
Re-exports§
pub use backend::Backend;pub use backend::MockBackend;pub use backend::ReduceOp;pub use comm::all_gather;pub use comm::all_reduce_max;pub use comm::all_reduce_mean;pub use comm::all_reduce_min;pub use comm::all_reduce_product;pub use comm::all_reduce_sum;pub use comm::barrier;pub use comm::broadcast;pub use comm::broadcast_from;pub use comm::gather_tensor;pub use comm::is_main_process;pub use comm::rank;pub use comm::reduce_scatter_mean;pub use comm::reduce_scatter_sum;pub use comm::scatter_tensor;pub use comm::sync_gradient;pub use comm::sync_gradients;pub use comm::world_size;pub use ddp::DistributedDataParallel;pub use ddp::GradSyncStrategy;pub use ddp::GradientBucket;pub use ddp::GradientSynchronizer;pub use fsdp::CPUOffload;pub use fsdp::ColumnParallelLinear;pub use fsdp::FSDPMemoryStats;pub use fsdp::FullyShardedDataParallel;pub use fsdp::RowParallelLinear;pub use fsdp::ShardingStrategy;pub use pipeline::Pipeline;pub use pipeline::PipelineMemoryStats;pub use pipeline::PipelineSchedule;pub use pipeline::PipelineStage;pub use process_group::ProcessGroup;pub use process_group::World;
Modules§
- backend
- Backend - Communication Backend Abstractions
- comm
- Communication - High-level Communication Utilities
- ddp
- DDP - Distributed Data Parallel
- fsdp
- FSDP - Fully Sharded Data Parallel
- pipeline
- Pipeline Parallelism
- prelude
- Common imports for distributed training.
- process_
group ProcessGroup- Process Group Abstraction