Expand description
GPU Monitoring Module (MLOPS-005)
btop-inspired GPU monitoring for terminal training dashboard.
§Toyota Way: Andon
Visual alerting system for immediate problem detection. Thermal throttling, memory pressure, and power limits trigger alerts.
§Example
ⓘ
use entrenar::monitor::gpu::{GpuMonitor, GpuMetrics, GpuAlert};
let monitor = GpuMonitor::new()?;
let metrics = monitor.sample();
for m in &metrics {
println!("GPU {}: {}°C, {}% util", m.device_id, m.temperature_celsius, m.utilization_percent);
}Structs§
- Andon
Thresholds - Andon thresholds for GPU alerts
- GpuAndon
System - GPU Andon system for alert management
- GpuMetrics
- GPU metrics snapshot (inspired by btop’s GPU visualization)
- GpuMetrics
Buffer - GPU metrics history buffer (ring buffer)
- GpuMonitor
- GPU monitor that collects metrics
- GpuProcess
- Process using GPU resources
Enums§
- GpuAlert
- GPU alert types for Andon system
Functions§
- format_
gpu_ panel - Format GPU metrics for terminal display
- render_
progress_ bar - Render a progress bar for terminal display
- render_
sparkline - Render a sparkline from values