raytop-0.1.1 is not a library.
raytop
A real-time TUI monitor for Ray clusters — like htop for distributed GPU training.
Monitors cluster-wide CPU/GPU/memory utilization, per-node resource usage, per-GPU utilization via Prometheus metrics, running jobs, and live actor counts — all from the Ray dashboard API.
Install
Install from crates.io:
Usage
Point raytop at your Ray dashboard endpoint. The dashboard is typically
available on port 8265 of the Ray head node.
Use j/k or arrow keys to navigate nodes, Enter to open the detail panel,
Tab to switch focus between jobs and nodes, t to cycle themes, and q to quit.
Build from Source
Examples
- verl Training — PPO/GRPO training on a Ray cluster using FSDP2 backend
How It Works
- Cluster status — REST API (
/api/cluster_status) for cluster-wide CPU/GPU/memory allocation - Node discovery — REST API (
/api/v0/nodes) for per-node info and state - Per-GPU metrics — Prometheus scraping (
/api/prometheus/sd) for real-time GPU utilization, GRAM usage - Jobs & actors — REST API (
/api/jobs/,/api/v0/actors) for running jobs and actor counts per node - Async — Background
tokiotask fetches all data in parallel, TUI never blocks
License
Apache-2.0