Expand description
Utility functions and data structures for ZeRO optimization
Structs§
- Gradient
Buffer - Gradient buffer for ZeRO Stage 2+
- Parameter
Group - Parameter group for ZeRO optimization
- Parameter
Partition - Parameter partition for ZeRO Stage 3
- Partition
Info - Partition information for distributed parameters
- ZeRO
State - ZeRO optimizer state management
Functions§
- all_
gather_ gradients - All-gather gradients from all devices
- calculate_
bucket_ size - Calculate optimal bucket size for gradient communication
- gather_
parameters - Gather parameters from all devices
- partition_
gradients - Partition gradients across devices for ZeRO Stage 2+
- partition_
parameters - Partition parameters across devices for ZeRO Stage 3
- reduce_
scatter_ gradients - Reduce-scatter gradients across devices