Expand description
Memory-efficient optimizer operations
This module provides memory-efficient optimization for very large models through gradient accumulation, chunked processing, and memory usage estimation.
§Features
- Gradient accumulation to reduce memory pressure
- Chunked parameter processing for large models
- Memory usage estimation and recommendations
- Streaming gradient computation
§Performance
Enables optimization of models with billions of parameters through efficient memory management.
Structs§
- Chunked
Optimizer - Chunked optimizer for processing large parameter arrays in chunks
- Gradient
Accumulator - Gradient accumulator for memory-efficient training
- Memory
Usage Estimator - Memory usage estimator for optimizers