vor
Opinionated cross-platform performance instrumentation for Rust. It does both halves of the job: measuring your code and showing the numbers live, inside the running app, instead of writing a trace you open somewhere else.
Annotate a function and the same scope goes to a puffin flame chart, a tracing span, and (with the cuda feature) an NVTX range. With the viz feature, vor also draws an egui panel with that flame chart, frame-rate bars, and live system and GPU metrics.
Highlights
- Macros for functions, methods, and whole
implblocks:#[profile],#[all_functions],#[skip].const fns are left alone. - The same annotations work on native macOS, web/wasm, and NVIDIA (NVTX).
- An egui panel with frame bars, a puffin flame chart, and one line plot per metric, with pin, pause, range-select, and zoom.
- System metrics sampled for you every frame: frame time, resident memory, and per-frame I/O.
- Live GPU metrics in the panel: Apple Silicon via IOKit and IOReport (no
sudo), NVIDIA via NVML. - Sinks that write a Chrome trace on native or push to the browser DevTools timeline on web.
Install
vor is feature-gated, so pull in only what your platform needs.
[]
= { = "https://github.com/SConsul/vor", = ["viz", "mac"] }
| feature | adds |
|---|---|
| (none) | instrumentation macros plus puffin/tracing scopes (no cost until enable()) |
viz |
the egui profiler panel (vor::viz) |
mac |
macOS: ChromeTraceSink, resident-memory sampling, and the IOKit/IOReport GPU collector |
web |
wasm: BrowserSink (DevTools User Timing), JS-heap memory, browser-safe puffin |
cuda |
NVIDIA: live GPU rows via NVML, plus an NVTX range per scope for Nsight Systems |
These features are independent; combine them as needed, for example ["viz", "mac", "cuda"].
Instrumenting code
// A single function or method.
// Every method in an impl. Scopes are named Renderer::sort,
// Renderer::shade, and so on, with no per-method attribute.
// An ad-hoc block scope.
Turn collection on once, and mark a boundary per rendered frame:
Until enable() is called the puffin half does nothing. The tracing half is always live for whatever subscriber you install.
The in-app panel (viz)
vor owns the system rows (frame_ms, memory_mb, io_ms, io_MB, and gpu_* where supported). You describe only your own per-frame workload.
use VecDeque;
use ;
const
const METRICS: & =
&;
let mut state = new;
let cap = FRAME_MS.history_capacity;
let mut history: = with_capacity;
// Once per displayed frame, inside your egui update. Skip the tick
// and the push while paused so every graph freezes together instead
// of scrolling under the pinned cursor:
if !state.is_paused
show; // draw the panel
PanelState::tick() advances vor's own system ring. Push one workload record per tick so the two stay aligned, and gate both on is_paused() as above.
Panel interactions
The bars and every metric plot share one time axis: a pin, a zoom range, and pause apply to all of them at once.
| action | effect |
|---|---|
| click a frame bar | pin the cursor on that frame (all graphs) and pause |
| shift-drag the bars | zoom every graph to that frame range (pins the slowest frame) |
| pause/resume button | freeze / follow the live stream (PanelState::toggle_pause) |
| scroll over the flame chart | zoom the flame chart's within-frame time; drag pans, double-click resets |
| profiler chip | annotate frame_ms with vor's own per-frame cost |
System and GPU metrics
vor samples these itself on each tick():
| metric | source | platforms |
|---|---|---|
frame_ms |
wall time between ticks | all |
memory_mb |
RSS on mac, performance.memory on web (Chromium) |
mac, web |
io_ms, io_MB |
your record_io(ns, bytes) calls, drained per frame |
all |
gpu_util |
IOKit IOAccelerator on mac, NVML utilization on cuda |
mac, cuda |
gpu_sm |
IOKit IOAccelerator renderer utilization |
mac |
gpu_power |
IOReport GPU Energy on mac, NVML power draw on cuda |
mac, cuda |
pcie |
NVML PCIe TX+RX | cuda |
gpu_mem |
IOKit IOAccelerator in-use memory on mac, NVML used on cuda |
mac, cuda |
gpu_temp |
NVML core temperature | cuda |
gpu_clock |
NVML SM clock | cuda |
A background thread the panel starts polls the GPU backend (mac or cuda, no sudo) and the rows show only metrics that backend supplies: gpu_sm is macOS-only (NVML has no SM-occupancy counter), while pcie, gpu_temp, and gpu_clock are NVIDIA-only (the macOS backend doesn't read them). On a platform with no backend, including the browser (which gives a web page no GPU-telemetry API), the GPU rows are dropped rather than drawn as flat zeros.
Feed I/O time from anywhere, including background threads:
record_io; // lock-free accumulator
Sinks (offline traces)
Install a sink once at startup, then drop the returned guard to flush.
// macOS. Open the output in chrome://tracing or Perfetto.
use ;
let guard = ChromeTraceSink .install;
// Web. Spans show up in the DevTools Performance tab.
use ;
let guard = BrowserSink.install;
NVIDIA (cuda)
The cuda feature does two independent things on NVIDIA hardware:
- Fills the panel's
gpu_util,pcie, andgpu_powerrows from NVML, the same waymacfills them from IOReport. - Opens an NVTX range per scope, so your instrumented code lines up on an Nsight Systems timeline next to CUDA and GPU work. No code changes are needed: the same
#[profile],#[all_functions], andprofile_scope!carry over.
Neither needs a CUDA toolkit to build. nvtx vendors its headers and compiles them with cc; nvml-wrapper loads libnvidia-ml from the driver at runtime, so the GPU rows populate on any machine with an NVIDIA driver installed.
Other utilities
FrameStats: an HDR histogram of per-frame nanoseconds, withp50_ns,p95_ns,p99_ns, andmean_ns.calibrate()andempty_span_ns(): measure the per-span instrumentation overhead so you can subtract it.current_memory_bytes(): process memory on supported platforms.
Examples
examples/custom_metrics.rs is headless and shows the API shape (#[profile],
#[all_functions], caller-defined metrics, the PanelState loop):
examples/live_panel.rs opens a window and renders the live panel, so it doubles
as an end-to-end check of each platform backend. Pick the feature set for the
machine you are on:
# macOS (Apple Silicon): live gpu_util / gpu_sm / gpu_power via IOKit + IOReport
# NVIDIA box: live gpu_util / pcie / gpu_power via NVML, plus NVTX ranges
# Web / browser: the standalone demo in web/ renders the panel in a canvas
&&
(examples/web/ is a minimal eframe + trunk app; GPU rows are absent in the browser,
so it verifies the web build, the panel, and the DevTools timeline path.)
Run the GPU smoke tests directly (each asserts the backend returns sane readings; run on the matching machine):
License
Dual-licensed under MIT or Apache-2.0.