spark-dashboard-0.3.0 is not a library.

Spark Dashboard

Real-time hardware and LLM inference monitoring for Linux systems with NVIDIA GPUs. Developed and tested on the NVIDIA DGX Spark, but works on any Linux host with NVIDIA drivers — discrete-GPU workstations, DGX boxes, cloud VMs. A Rust backend collects GPU, CPU, memory, disk, and network metrics alongside vLLM engine statistics and streams them over WebSocket to a React frontend.

Spark Dashboard

Quick Start

Install on your Linux host

Run as your normal user on any Linux host with NVIDIA drivers (requires Rust 1.75+):

cargo install spark-dashboard
sudo spark-dashboard service install
systemctl status spark-dashboard

The dashboard is now served on port 3000. See Install on your Linux host for the full guide, config overrides, and uninstall.

Develop locally

git clone https://github.com/niklasfrick/spark-dashboard.git
cd spark-dashboard
cp .env.example .env           # edit with your remote host's user/host
./dev/dev.sh

Open http://localhost:5173 in your browser. See dev/README.md for details on what each script does.

Features

Hardware Monitoring (1s polling via NVML, sysinfo, procfs)

GPU utilization, temperature, power draw, clock frequencies, fan speed
GPU event detection — thermal throttling, hardware slowdown, power brake
CPU aggregate and per-core utilization with heatmap
Memory breakdown — CPU RAM and GPU VRAM separately on discrete-GPU hosts, or a single unified pool on systems where CPU and GPU share memory (e.g. DGX Spark GB10, GH200)
Disk and network I/O throughput

LLM Engine Monitoring (vLLM via Prometheus metrics)

Tokens per second (generation + prompt)
Time to first token, end-to-end latency, queue time
Active/queued requests, batch size
KV cache utilization, prefix cache hit rate
Automatic engine discovery via process scan and Docker API

Dashboard

Arc gauges, time-series charts, sparklines, per-core heatmap
15-minute rolling history with circular buffers
Connection status badge, staleness detection, auto-reconnect
Multi-engine tabs when multiple inference servers are running

Architecture

┌──────────────────────┐         WebSocket (JSON)         ┌────────────────────┐
│     Rust Backend     │ ──────────────────────────────▶  │  React Frontend    │
│                      │                                  │                    │
│  Tokio tasks:        │                                  │  useMetrics hook   │
│  ├─ metrics_collector│  broadcast channel (capacity 16) │  ├─ WebSocket conn │
│  │  (GPU/CPU/mem/…) ─┼──▶ tx ──▶ ws_handler ──▶ client │  ├─ batch flush 2s │
│  └─ engine_collector │                                  │  └─ circular bufs  │
│     (vLLM/Docker)    │                                  │                    │
│                      │  Static files (rust-embed)       │  Recharts, Tailwind│
│  Axum router         │ ◀──── production only ─────────  │  shadcn/ui         │
└──────────────────────┘                                  └────────────────────┘
  Linux host (e.g. DGX Spark)                                  Browser

Two independent Tokio tasks run in parallel — one for hardware metrics (NVML, sysinfo, procfs) and one for engine detection/polling. Both feed into a broadcast channel that fans out to all connected WebSocket clients. In production the frontend is embedded in the binary via rust-embed; in development, Vite serves the frontend locally and proxies API/WebSocket traffic to the remote backend.

Configuration

All operator config lives in a repo-root .env file. Copy the template and edit:

cp .env.example .env

Variable	Purpose
`DEPLOY_USER`	SSH user on the remote host (required)
`DEPLOY_HOST`	Hostname or IP of the remote host (required)
`DEPLOY_DIR`	Project path on the remote host, relative to remote home (default `spark-dashboard`)
`VITE_BACKEND_URL`	Where Vite proxies `/ws` and `/api` (default `http://localhost:3000`)

Legacy SPARK_USER / SPARK_HOST / SPARK_DIR are still accepted as a fallback when DEPLOY_* are unset — dev.sh prints a one-line deprecation note. The scripts in dev/ source this file; Vite picks up VITE_* variables automatically. .env is gitignored — never commit it.

Install on your Linux host

The dashboard runs as a supervised systemd service. Two install paths; both build from source on the host.

Option A — via cargo (recommended)

# On the host. Requires Rust 1.75+, NVIDIA drivers, and internet access.
cargo install spark-dashboard
sudo spark-dashboard service install
systemctl status spark-dashboard

cargo install pulls the crate from crates.io and compiles it locally. service install copies the binary to /usr/local/bin, creates a locked-down spark-dashboard system user (added to video, render, docker groups for NVML and Docker access), writes the systemd unit, and enables it.

Option B — from a local checkout

Use this when you want to install without crates.io (audit the source, air-gapped install, or deploy an unreleased commit).

# On the host. Run as your normal user — the script escalates to sudo
# only for the systemd wiring step.
git clone https://github.com/niklasfrick/spark-dashboard.git
cd spark-dashboard
./packaging/install.sh

This builds the frontend (npm run build) and the Rust binary (cargo build --release), then hands off to the same service install logic as Option A. You'll be prompted for your sudo password once, when the service is installed.

Managing the service

sudo systemctl {start|stop|restart} spark-dashboard
journalctl -u spark-dashboard -f          # follow logs
sudo spark-dashboard service status       # same as `systemctl status`

Optional overrides live in /etc/spark-dashboard/config.env — set SPARK_DASHBOARD_PORT, SPARK_DASHBOARD_BIND, SPARK_DASHBOARD_POLL_INTERVAL, SPARK_DASHBOARD_GPU_INDEX, or RUST_LOG, then sudo systemctl restart spark-dashboard.

Upgrade

# Option A
cargo install --force spark-dashboard && sudo spark-dashboard service install

# Option B
cd spark-dashboard && git pull && ./packaging/install.sh

Re-running service install is idempotent: it stops the service, swaps the binary, and starts it again, preserving /etc/spark-dashboard/config.env.

Uninstall

sudo spark-dashboard service uninstall         # keep /etc/spark-dashboard
sudo spark-dashboard service uninstall --purge # remove everything

CLI options

spark-dashboard [OPTIONS]                 run the server (default)
spark-dashboard service install [--prefix /usr/local]
spark-dashboard service uninstall [--purge]
spark-dashboard service status

  -p, --port <PORT>           Listen port [default: 3000] [env: SPARK_DASHBOARD_PORT]
  -b, --bind <BIND>           Bind address [default: 0.0.0.0] [env: SPARK_DASHBOARD_BIND]
      --poll-interval <MS>    Polling interval ms [default: 1000] [env: SPARK_DASHBOARD_POLL_INTERVAL]
      --gpu-index <IDX>       NVML GPU index to monitor [default: 0] [env: SPARK_DASHBOARD_GPU_INDEX]
      --engine <TYPE>         Manual engine type (e.g. vllm)
      --engine-url <URL>      Manual engine endpoint (requires --engine)

On multi-GPU hosts use --gpu-index to select which device the dashboard monitors. Engines are auto-detected via process scan and Docker API. Use --engine and --engine-url to override when auto-detection doesn't work.

Development

Prerequisites

Local machine (macOS or Linux): Node.js 20+, npm, rsync, ssh
Remote host: Linux + NVIDIA drivers, Rust 1.75+, SSH access with key-based auth (no password prompts)
Optional: brew install fswatch for instant file-change detection (the watcher falls back to 2s polling without it)

Running the dev environment

./dev/dev.sh

The script handles everything:

Syncs the full project to the remote host via rsync
Builds the Rust backend on the remote host (cargo build --release)
Starts the backend on the remote host (port 3000)
Starts the Vite dev server locally (port 5173)
Watches src/ and Cargo.toml for Rust changes — auto-syncs and rebuilds on the remote host

What you edit	What happens
Frontend files (`frontend/src/`)	Vite hot-reloads instantly in the browser
Backend files (`src/`, `Cargo.toml`)	Auto-detected → rsync to remote host → rebuild → restart (~compile time)

Useful while dev.sh is running:

# Watch backend logs in another terminal
ssh "${DEPLOY_USER}@${DEPLOY_HOST}" tail -f /tmp/spark-dashboard.log

# Press Ctrl+C in the dev.sh terminal to stop everything (cleans up the remote process too)

How the proxy works

By default, Vite proxies /ws and /api to localhost:3000 — this works out of the box with any SSH tunnel that maps the remote host's port 3000 to your local machine.

Browser → localhost:5173/ws  → Vite proxy → localhost:3000/ws (forwarded to remote)
Browser → localhost:5173/api → Vite proxy → localhost:3000/api (forwarded to remote)

To connect directly over the network instead, set in .env:

VITE_BACKEND_URL=http://${DEPLOY_HOST}:3000

The frontend connects to the WebSocket using window.location.host, so the proxy is transparent — no code changes between dev and production.

Releases

Releases are cut from main via release-please — conventional commits drive the version bump, merging the release PR tags vX.Y.Z and triggers cargo publish to crates.io. main always reflects the latest stable version; see CHANGELOG.md for release notes.

Testing

# Frontend
cd frontend && npm test

# Backend (on Linux)
cargo test

Backend tests include platform-aware stubs — GPU and memory tests validate real NVML/procfs parsing on Linux, with compile-time stubs on other platforms.

Project Structure

├── src/
│   ├── main.rs                 CLI args, task spawning, server startup
│   ├── server.rs               Axum router, static file serving
│   ├── ws.rs                   WebSocket handler
│   ├── metrics/
│   │   ├── mod.rs              MetricsSnapshot, collector loop
│   │   ├── gpu.rs              NVML GPU metrics + event detection
│   │   ├── cpu.rs              CPU aggregate + per-core
│   │   ├── memory.rs           System RAM + GPU VRAM + unified-memory detection
│   │   ├── disk.rs             Disk I/O rates
│   │   └── network.rs          Network I/O rates
│   └── engines/
│       ├── mod.rs              Engine trait, state machine, collector
│       ├── detector.rs         Process scan + Docker discovery
│       ├── vllm.rs             vLLM adapter (Prometheus parsing)
│       └── prometheus.rs       Prometheus text-format parser
├── frontend/
│   └── src/
│       ├── hooks/              useMetrics, useMetricsHistory
│       ├── components/
│       │   ├── views/          Dashboard, GlanceableView, DetailedView
│       │   ├── engines/        EngineSection, EngineCard
│       │   ├── charts/         TimeSeriesChart, Sparkline, CoreHeatmap
│       │   └── gauges/         ArcGauge
│       ├── types/              TypeScript type definitions
│       └── lib/                Circular buffer, formatting, theme
├── dev/
│   ├── dev.sh                  Dev loop (local frontend + remote backend)
│   └── README.md               Operator docs
├── .env.example                Configuration template
├── LICENSE                     MIT
├── CONTRIBUTING.md
└── Cargo.toml

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

spark-dashboard 0.3.0

Spark Dashboard

Quick Start

Install on your Linux host

Develop locally

Features

Architecture

Configuration

Install on your Linux host

Option A — via cargo (recommended)

Option B — from a local checkout

Managing the service

Upgrade

Uninstall

CLI options

Development

Prerequisites

Running the dev environment

How the proxy works

Releases

Testing

Project Structure

Contributing

License