Expand description
Enhanced Distributed Training Profiling
This module provides comprehensive profiling support for distributed training scenarios, including multi-node coordination, gradient synchronization analysis, and communication pattern optimization.
Structs§
- Bottleneck
- Detected performance bottleneck
- Communication
Event - Communication event between nodes
- Communication
Summary - Summary of communication patterns
- Distributed
Profiler - Distributed training profiler
- Distributed
Profiler Config - Configuration for distributed profiling
- Distributed
Profiling Report - Distributed profiling report
- Load
Balance Analysis - Load balance analysis across nodes
- Node
Info - Information about a node in the distributed cluster
- Node
Performance Snapshot - Performance snapshot for a single node
- Realtime
Stats - Real-time statistics for dashboards
- Synchronization
Event - Gradient synchronization event
- Synchronization
Summary - Summary of synchronization operations
Enums§
- Bottleneck
Type - Type of performance bottleneck
- Communication
Type - Type of communication between nodes
- Node
Role - Node role in distributed training
- Node
Status - Node status
- Sync
Type - Type of gradient synchronization