Expand description
Debugging utilities for distributed training systems
This module provides comprehensive debugging tools including operation tracing, state inspection, diagnostic tools, and automated troubleshooting capabilities.
Structs§
- Active
Operation - Active operation information
- Communication
State - Communication state information
- Debug
Config - Configuration for debugging utilities
- Debug
Event - Debug event for tracking system operations
- Diagnostic
Result - Diagnostic check result
- Distributed
Debugger - Comprehensive debugging system for distributed training
- Process
Group State - Process group state information
- Resource
State - Resource state information
- System
State Snapshot - System state snapshot for debugging
Enums§
- LogLevel
- Logging levels for debugging
Functions§
- get_
global_ debugger - Get the global debugger instance
- init_
global_ debugger - Initialize the global debugger with custom configuration