Module distributed

Module distributed 

Source
Expand description

Distributed Training - Data Parallelism and Model Parallelism

This module provides utilities for distributed training across multiple nodes/processes.

Structs§

DataParallelTrainer
Data parallel trainer
DistributedConfig
Distributed trainer configuration
DistributedDataLoader
Distributed data loader
DistributedOptimizer
Distributed optimizer wrapper
GradientCompression
Gradient compression for communication efficiency
RingAllReduce
Ring All-Reduce implementation

Enums§

CommunicationBackend
Communication backend for distributed training
CompressionMethod
DistributedStrategy
Distributed training strategy
GradientAggregation
Gradient aggregation method