Crate axonml_distributed

Expand description

Distributed training for AxonML — data, model, pipeline, and tensor parallelism.

DDP (DistributedDataParallel with gradient bucketing), FSDP (Fully Sharded Data Parallel — ZeRO-2/ZeRO-3 + HybridShard + CPU offload), Pipeline (GPipe/1F1B/Interleaved microbatch scheduling), collective ops (all-reduce with 5 strategies, broadcast, all-gather, reduce-scatter, gather, scatter, reduce, send/recv, barrier), ProcessGroup / World abstraction, NcclBackend (dynamic libcudart/libnccl loading, multi-node init via NcclUniqueId), and MockBackend (shared-state in-process simulation for deterministic testing).

§File

crates/axonml-distributed/src/lib.rs

§Author

Andrew Jewell Sr. — AutomataNexus LLC ORCID: 0009-0005-2158-7060

§Updated

April 14, 2026 11:15 PM EST

§Disclaimer

Use at own risk. This software is provided “as is”, without warranty of any kind, express or implied. The author and AutomataNexus shall not be held liable for any damages arising from the use of this software.

Re-exports§

pub use backend::Backend;
pub use backend::MockBackend;
pub use backend::ReduceOp;
pub use comm::all_gather;
pub use comm::all_reduce_max;
pub use comm::all_reduce_mean;
pub use comm::all_reduce_min;
pub use comm::all_reduce_product;
pub use comm::all_reduce_sum;
pub use comm::barrier;
pub use comm::broadcast;
pub use comm::broadcast_from;
pub use comm::gather_tensor;
pub use comm::is_main_process;
pub use comm::rank;
pub use comm::reduce_scatter_mean;
pub use comm::reduce_scatter_sum;
pub use comm::scatter_tensor;
pub use comm::sync_gradient;
pub use comm::sync_gradients;
pub use comm::world_size;
pub use ddp::DistributedDataParallel;
pub use ddp::GradSyncStrategy;
pub use ddp::GradientBucket;
pub use ddp::GradientSynchronizer;
pub use fsdp::CPUOffload;
pub use fsdp::ColumnParallelLinear;
pub use fsdp::FSDPMemoryStats;
pub use fsdp::FullyShardedDataParallel;
pub use fsdp::RowParallelLinear;
pub use fsdp::ShardingStrategy;
pub use pipeline::Pipeline;
pub use pipeline::PipelineMemoryStats;
pub use pipeline::PipelineSchedule;
pub use pipeline::PipelineStage;
pub use process_group::ProcessGroup;
pub use process_group::World;

Modules§

backend: Backend - Communication Backend Abstractions
comm: Communication - High-level Communication Utilities
ddp: DDP - Distributed Data Parallel
fsdp: FSDP - Fully Sharded Data Parallel
pipeline: Pipeline Parallelism
prelude: Common imports for distributed training.
process_group: ProcessGroup - Process Group Abstraction

Type Aliases§

DDP: Type alias for DistributedDataParallel.
FSDP: Type alias for FullyShardedDataParallel.