Skip to main content

Module gpu_metrics

Module gpu_metrics 

Source
Expand description

GPU metrics collection for observability.

Collects per-GPU utilization, memory, temperature, and power metrics using vendor-specific interfaces:

  • NVIDIA: nvidia-smi CLI (avoids hard NVML dependency)
  • AMD: sysfs under /sys/class/drm/card{N}/device/
  • Intel: sysfs under /sys/class/drm/card{N}/device/
  • Apple: IOKit via powermetrics (macOS only)

Structs§

GpuHealthReport
Per-GPU health check result
GpuUtilizationReport
Per-GPU utilization snapshot

Enums§

GpuHealthStatus
GPU health status

Functions§

check_gpu_health
Collect GPU health reports.
collect_gpu_metrics
Collect GPU utilization metrics for all detected GPUs.