# llmtop
Terminal GPU monitor that links wattage to the LLM burning it. Tells you VRAM share, joules per token, hosted-API equivalent cost, and session energy/CO2.
Ollama today. llama.cpp next.

`$/1K` = per-1K output tokens at the hosted-API price. Default: Claude Sonnet. Override with `--compare gpt-4o | gemini-2.5`.
## vs nvtop / nvitop / asitop
| GPU util / VRAM / power | yes | yes | yes | yes |
| Knows loaded LLM models | no | no | no | yes |
| Per-model VRAM share | no | no | no | yes |
| Joules per token | no | no | no | yes |
| API-equivalent $ cost | no | no | no | yes |
| Session kWh + CO2 | no | no | no | yes |
| Cross-platform | partial | partial| macOS only | Linux + Win + macOS\* |
\* Apple Silicon: v0.2.
## Install
```bash
cargo install llmtop
```
## Usage
```bash
llmtop # default: proxy on at :11435
llmtop --ollama-url http://gpu:11434 # remote ollama
llmtop --compare gpt-4o # change cost-equivalent provider
llmtop --grid-co2 230 # local grid intensity (gCO2/kWh)
llmtop --proxy 11500 # custom proxy port
llmtop --no-proxy # disable proxy
```
Hotkeys: `q:quit p:pause c:clear`.
### Live tokens/sec
Ollama does not expose live throughput on `/api/ps`. llmtop runs a reverse proxy by default on `:11435`, forwards every request to upstream Ollama unchanged, and reads `eval_count` / `eval_duration` from the responses. Point your client at the proxy:
```bash
OLLAMA_HOST=http://127.0.0.1:11435 ollama run qwen2.5-coder:7b "explain quicksort"
```
Direct traffic to `:11434` is invisible — `TOK/S` and `J/TOK` stay idle unless requests pass through the proxy.
## What's measured
| GPU util / VRAM / power / temp | NVML (Linux, Windows). IOReport (macOS) in v0.2. |
| Multi-GPU | Aggregated (sum power/VRAM, avg util) in v0.1. |
| Loaded models, per-model VRAM | Ollama `/api/ps` |
| Tokens/sec live | Reverse proxy parses `eval_count` / `eval_duration` from `/api/generate` and `/api/chat` |
| J/token | `power_w / tokens_per_sec` |
| Session kWh | Trapezoidal integration of GPU power over time |
| API-equivalent $ | Provider price tables in `src/pricing/mod.rs` |
| CO2eq | session kWh × `--grid-co2` (gCO2/kWh) |
## Roadmap
- v0.2: Apple Silicon (M1–M5) via IOReport, llama.cpp Prometheus, per-GPU breakdown (`--per-gpu`)
- v0.3: vLLM, LM Studio, MLX
- v0.4: Prometheus exporter, JSON metrics, write-to-file mode
- v0.5: AMD ROCm, Intel Arc
## Wrong API price?
Edit `src/pricing/mod.rs` and open a PR.
## License
MIT