Expand description
GPU metrics collection for observability.
Collects per-GPU utilization, memory, temperature, and power metrics using vendor-specific interfaces:
- NVIDIA:
nvidia-smiCLI (avoids hard NVML dependency) - AMD: sysfs under
/sys/class/drm/card{N}/device/ - Intel: sysfs under
/sys/class/drm/card{N}/device/ - Apple:
IOKitviapowermetrics(macOS only)
Structs§
- GpuHealth
Report - Per-GPU health check result
- GpuUtilization
Report - Per-GPU utilization snapshot
Enums§
- GpuHealth
Status - GPU health status
Functions§
- check_
gpu_ health - Collect GPU health reports.
- collect_
gpu_ metrics - Collect GPU utilization metrics for all detected GPUs.