Expand description
ZeRO (Zero Redundancy Optimizer) Implementation for TrustformeRS
ZeRO is a memory-efficient training technique that partitions optimizer states, gradients, and parameters across devices to reduce memory usage while maintaining training efficiency.
Implements three stages:
- Stage 1: Partition optimizer states
- Stage 2: Partition optimizer states + gradients
- Stage 3: Partition optimizer states + gradients + parameters
Re-exports§
pub use zero_optimizer::ZeROConfig;pub use zero_optimizer::ZeROOptimizer;pub use zero_optimizer::ZeROStage;pub use zero_stage1::ZeROStage1;pub use zero_stage2::ZeROStage2;pub use zero_stage3::ZeROStage3;pub use zero_utils::all_gather_gradients;pub use zero_utils::gather_parameters;pub use zero_utils::partition_gradients;pub use zero_utils::partition_parameters;pub use zero_utils::reduce_scatter_gradients;pub use zero_utils::GradientBuffer;pub use zero_utils::ParameterGroup;pub use zero_utils::ParameterPartition;pub use zero_utils::ZeROState;
Modules§
- zero_
optimizer - Main ZeRO Optimizer Implementation
- zero_
stage1 - ZeRO Stage 1: Optimizer State Partitioning
- zero_
stage2 - ZeRO Stage 2: Optimizer State + Gradient Partitioning
- zero_
stage3 - ZeRO Stage 3: Full Parameter Partitioning
- zero_
utils - Utility functions and data structures for ZeRO optimization
Structs§
- ZeRO
Memory Stats - Memory statistics for ZeRO optimization
Enums§
- ZeRO
Implementation Stage - ZeRO optimization stages