# vor
Opinionated cross-platform performance instrumentation for Rust. It does both halves of the job: measuring your code and showing the numbers live, inside the running app, instead of writing a trace you open somewhere else.
Annotate a function and the same scope goes to a puffin flame chart, a `tracing` span, and (with the `cuda` feature) an NVTX range. With the `viz` feature, vor also draws an egui panel with that flame chart, frame-rate bars, and live system and GPU metrics.
## Highlights
- Macros for functions, methods, and whole `impl` blocks: `#[profile]`, `#[all_functions]`, `#[skip]`. `const fn`s are left alone.
- The same annotations work on native macOS, web/wasm, and NVIDIA (NVTX).
- An egui panel with frame bars, a puffin flame chart, and one line plot per metric, with pin, pause, range-select, and zoom.
- System metrics sampled for you every frame: frame time, resident memory, and per-frame I/O.
- Live GPU metrics in the panel: Apple Silicon via IOKit and IOReport (no `sudo`), NVIDIA via NVML.
- Sinks that write a Chrome trace on native or push to the browser DevTools timeline on web.
## Install
vor is feature-gated, so pull in only what your platform needs.
```toml
[dependencies]
vor = { git = "https://github.com/SConsul/vor", features = ["viz", "mac"] }
```
| *(none)* | instrumentation macros plus puffin/tracing scopes (no cost until `enable()`) |
| `viz` | the egui profiler panel (`vor::viz`) |
| `mac` | macOS: `ChromeTraceSink`, resident-memory sampling, and the IOKit/IOReport GPU collector |
| `web` | wasm: `BrowserSink` (DevTools User Timing), JS-heap memory, browser-safe puffin |
| `cuda` | NVIDIA: live GPU rows via NVML, plus an NVTX range per scope for Nsight Systems |
These features are independent; combine them as needed, for example `["viz", "mac", "cuda"]`.
## Instrumenting code
```rust
// A single function or method.
#[vor::profile]
fn render(frame: u32) { /* ... */ }
// Every method in an impl. Scopes are named Renderer::sort,
// Renderer::shade, and so on, with no per-method attribute.
struct Renderer { /* ... */ }
#[vor::all_functions]
impl Renderer {
fn sort(&self) { /* ... */ }
fn shade(&self) { /* ... */ }
// Keep a hot trivial helper out of the flame chart.
#[vor::skip]
fn dirty(&self) -> bool { /* ... */ }
}
// An ad-hoc block scope.
fn step() {
vor::profile_scope!("expensive_part");
/* ... */
}
```
Turn collection on once, and mark a boundary per rendered frame:
```rust
fn main() {
vor::enable(); // switch puffin scope collection on
loop {
// ... your frame ...
vor::frame_mark(); // group scopes into this frame
}
}
```
Until `enable()` is called the puffin half does nothing. The `tracing` half is always live for whatever subscriber you install.
## The in-app panel (`viz`)
vor owns the system rows (`frame_ms`, `memory_mb`, `io_ms`, `io_MB`, and `gpu_*` where supported). You describe only your own per-frame workload.
```rust
use std::collections::VecDeque;
use vor::viz::{Metric, PanelConfig, PanelState, show};
#[derive(Clone, Copy)]
struct AppFrame { visible: u32 }
const fn visible_of(f: &AppFrame) -> f64 { f.visible as f64 }
const METRICS: &[Metric<AppFrame>] =
&[Metric::new("visible", visible_of, "splats").as_integer()];
let mut state = PanelState::new(PanelConfig::FRAME_MS);
let cap = PanelConfig::FRAME_MS.history_capacity;
let mut history: VecDeque<AppFrame> = VecDeque::with_capacity(cap);
// Once per displayed frame, inside your egui update. Skip the tick
// and the push while paused so every graph freezes together instead
// of scrolling under the pinned cursor:
if !state.is_paused() {
state.tick(); // sample system metrics, mark a puffin frame
if history.len() >= cap { history.pop_front(); }
history.push_back(AppFrame { visible: 1_500_000 });
}
show(ui, &mut state, &history, METRICS); // draw the panel
```
`PanelState::tick()` advances vor's own system ring. Push one workload record per `tick` so the two stay aligned, and gate both on `is_paused()` as above.
### Panel interactions
The bars and every metric plot share one time axis: a pin, a zoom range, and pause apply to all of them at once.
| click a frame bar | pin the cursor on that frame (all graphs) and pause |
| shift-drag the bars | zoom every graph to that frame range (pins the slowest frame) |
| pause/resume button | freeze / follow the live stream (`PanelState::toggle_pause`) |
| scroll over the flame chart | zoom the flame chart's within-frame time; drag pans, double-click resets |
| profiler chip | annotate `frame_ms` with vor's own per-frame cost |
## System and GPU metrics
vor samples these itself on each `tick()`:
| `frame_ms` | wall time between ticks | all |
| `memory_mb` | RSS on `mac`, `performance.memory` on `web` (Chromium) | `mac`, `web` |
| `io_ms`, `io_MB` | your `record_io(ns, bytes)` calls, drained per frame | all |
| `gpu_util` | IOKit `IOAccelerator` on `mac`, NVML utilization on `cuda` | `mac`, `cuda` |
| `gpu_sm` | IOKit `IOAccelerator` renderer utilization | `mac` |
| `gpu_power` | IOReport `GPU Energy` on `mac`, NVML power draw on `cuda` | `mac`, `cuda` |
| `pcie` | NVML PCIe TX+RX | `cuda` |
| `gpu_mem` | IOKit `IOAccelerator` in-use memory on `mac`, NVML used on `cuda` | `mac`, `cuda` |
| `gpu_temp` | NVML core temperature | `cuda` |
| `gpu_clock` | NVML SM clock | `cuda` |
A background thread the panel starts polls the GPU backend (`mac` or `cuda`, no `sudo`) and the rows show only metrics that backend supplies: `gpu_sm` is macOS-only (NVML has no SM-occupancy counter), while `pcie`, `gpu_temp`, and `gpu_clock` are NVIDIA-only (the macOS backend doesn't read them). On a platform with no backend, including the browser (which gives a web page no GPU-telemetry API), the GPU rows are dropped rather than drawn as flat zeros.
Feed I/O time from anywhere, including background threads:
```rust
vor::record_io(elapsed_ns, bytes); // lock-free accumulator
```
## Sinks (offline traces)
Install a sink once at startup, then drop the returned guard to flush.
```rust
// macOS. Open the output in chrome://tracing or Perfetto.
use vor::{ChromeTraceSink, Sink};
let guard = ChromeTraceSink { path: "trace.json".into() }.install();
```
```rust
// Web. Spans show up in the DevTools Performance tab.
use vor::{BrowserSink, Sink};
let guard = BrowserSink.install();
```
## NVIDIA (`cuda`)
The `cuda` feature does two independent things on NVIDIA hardware:
- Fills the panel's `gpu_util`, `pcie`, and `gpu_power` rows from [NVML](https://crates.io/crates/nvml-wrapper), the same way `mac` fills them from IOReport.
- Opens an [NVTX](https://github.com/NVIDIA/NVTX) range per scope, so your instrumented code lines up on an Nsight Systems timeline next to CUDA and GPU work. No code changes are needed: the same `#[profile]`, `#[all_functions]`, and `profile_scope!` carry over.
Neither needs a CUDA toolkit to build. `nvtx` vendors its headers and compiles them with `cc`; `nvml-wrapper` loads `libnvidia-ml` from the driver at runtime, so the GPU rows populate on any machine with an NVIDIA driver installed.
## Other utilities
- `FrameStats`: an HDR histogram of per-frame nanoseconds, with `p50_ns`, `p95_ns`, `p99_ns`, and `mean_ns`.
- `calibrate()` and `empty_span_ns()`: measure the per-span instrumentation overhead so you can subtract it.
- `current_memory_bytes()`: process memory on supported platforms.
## Examples
`examples/custom_metrics.rs` is headless and shows the API shape (`#[profile]`,
`#[all_functions]`, caller-defined metrics, the `PanelState` loop):
```sh
cargo run --features viz --example custom_metrics
```
`examples/live_panel.rs` opens a window and renders the live panel, so it doubles
as an end-to-end check of each platform backend. Pick the feature set for the
machine you are on:
```sh
# macOS (Apple Silicon): live gpu_util / gpu_sm / gpu_power via IOKit + IOReport
cargo run --example live_panel --features viz,mac
# NVIDIA box: live gpu_util / pcie / gpu_power via NVML, plus NVTX ranges
cargo run --example live_panel --features viz,cuda
# Web / browser: the standalone demo in web/ renders the panel in a canvas
cd examples/web && trunk serve --open # needs: cargo install trunk; rustup target add wasm32-unknown-unknown
```
(`examples/web/` is a minimal `eframe` + trunk app; GPU rows are absent in the browser,
so it verifies the `web` build, the panel, and the DevTools timeline path.)
Run the GPU smoke tests directly (each asserts the backend returns sane readings;
run on the matching machine):
```sh
cargo test --features viz,mac poll_yields_sane_readings # macOS
cargo test --features viz,cuda poll_yields_sane_readings # NVIDIA host
```
## License
Dual-licensed under MIT or Apache-2.0.