vor 0.1.0

Cross-platform performance instrumentation with an in-app egui panel and live system and GPU metrics.
Documentation

vor

Opinionated cross-platform performance instrumentation for Rust. It does both halves of the job: measuring your code and showing the numbers live, inside the running app, instead of writing a trace you open somewhere else.

Annotate a function and the same scope goes to a puffin flame chart, a tracing span, and (with the cuda feature) an NVTX range. With the viz feature, vor also draws an egui panel with that flame chart, frame-rate bars, and live system and GPU metrics.

Highlights

  • Macros for functions, methods, and whole impl blocks: #[profile], #[all_functions], #[skip]. const fns are left alone.
  • The same annotations work on native macOS, web/wasm, and NVIDIA (NVTX).
  • An egui panel with frame bars, a puffin flame chart, and one line plot per metric, with pin, pause, range-select, and zoom.
  • System metrics sampled for you every frame: frame time, resident memory, and per-frame I/O.
  • Live GPU metrics in the panel: Apple Silicon via IOKit and IOReport (no sudo), NVIDIA via NVML.
  • Sinks that write a Chrome trace on native or push to the browser DevTools timeline on web.

Install

vor is feature-gated, so pull in only what your platform needs.

[dependencies]
vor = { git = "https://github.com/SConsul/vor", features = ["viz", "mac"] }
feature adds
(none) instrumentation macros plus puffin/tracing scopes (no cost until enable())
viz the egui profiler panel (vor::viz)
mac macOS: ChromeTraceSink, resident-memory sampling, and the IOKit/IOReport GPU collector
web wasm: BrowserSink (DevTools User Timing), JS-heap memory, browser-safe puffin
cuda NVIDIA: live GPU rows via NVML, plus an NVTX range per scope for Nsight Systems

These features are independent; combine them as needed, for example ["viz", "mac", "cuda"].

Instrumenting code

// A single function or method.
#[vor::profile]
fn render(frame: u32) { /* ... */ }

// Every method in an impl. Scopes are named Renderer::sort,
// Renderer::shade, and so on, with no per-method attribute.
struct Renderer { /* ... */ }

#[vor::all_functions]
impl Renderer {
    fn sort(&self)  { /* ... */ }
    fn shade(&self) { /* ... */ }

    // Keep a hot trivial helper out of the flame chart.
    #[vor::skip]
    fn dirty(&self) -> bool { /* ... */ }
}

// An ad-hoc block scope.
fn step() {
    vor::profile_scope!("expensive_part");
    /* ... */
}

Turn collection on once, and mark a boundary per rendered frame:

fn main() {
    vor::enable();          // switch puffin scope collection on
    loop {
        // ... your frame ...
        vor::frame_mark();  // group scopes into this frame
    }
}

Until enable() is called the puffin half does nothing. The tracing half is always live for whatever subscriber you install.

The in-app panel (viz)

vor owns the system rows (frame_ms, memory_mb, io_ms, io_MB, and gpu_* where supported). You describe only your own per-frame workload.

use std::collections::VecDeque;
use vor::viz::{Metric, PanelConfig, PanelState, show};

#[derive(Clone, Copy)]
struct AppFrame { visible: u32 }

const fn visible_of(f: &AppFrame) -> f64 { f.visible as f64 }
const METRICS: &[Metric<AppFrame>] =
    &[Metric::new("visible", visible_of, "splats").as_integer()];

let mut state = PanelState::new(PanelConfig::FRAME_MS);
let cap = PanelConfig::FRAME_MS.history_capacity;
let mut history: VecDeque<AppFrame> = VecDeque::with_capacity(cap);

// Once per displayed frame, inside your egui update. Skip the tick
// and the push while paused so every graph freezes together instead
// of scrolling under the pinned cursor:
if !state.is_paused() {
    state.tick();                              // sample system metrics, mark a puffin frame
    if history.len() >= cap { history.pop_front(); }
    history.push_back(AppFrame { visible: 1_500_000 });
}
show(ui, &mut state, &history, METRICS);       // draw the panel

PanelState::tick() advances vor's own system ring. Push one workload record per tick so the two stay aligned, and gate both on is_paused() as above.

Panel interactions

The bars and every metric plot share one time axis: a pin, a zoom range, and pause apply to all of them at once.

action effect
click a frame bar pin the cursor on that frame (all graphs) and pause
shift-drag the bars zoom every graph to that frame range (pins the slowest frame)
pause/resume button freeze / follow the live stream (PanelState::toggle_pause)
scroll over the flame chart zoom the flame chart's within-frame time; drag pans, double-click resets
profiler chip annotate frame_ms with vor's own per-frame cost

System and GPU metrics

vor samples these itself on each tick():

metric source platforms
frame_ms wall time between ticks all
memory_mb RSS on mac, performance.memory on web (Chromium) mac, web
io_ms, io_MB your record_io(ns, bytes) calls, drained per frame all
gpu_util IOKit IOAccelerator on mac, NVML utilization on cuda mac, cuda
gpu_sm IOKit IOAccelerator renderer utilization mac
gpu_power IOReport GPU Energy on mac, NVML power draw on cuda mac, cuda
pcie NVML PCIe TX+RX cuda
gpu_mem IOKit IOAccelerator in-use memory on mac, NVML used on cuda mac, cuda
gpu_temp NVML core temperature cuda
gpu_clock NVML SM clock cuda

A background thread the panel starts polls the GPU backend (mac or cuda, no sudo) and the rows show only metrics that backend supplies: gpu_sm is macOS-only (NVML has no SM-occupancy counter), while pcie, gpu_temp, and gpu_clock are NVIDIA-only (the macOS backend doesn't read them). On a platform with no backend, including the browser (which gives a web page no GPU-telemetry API), the GPU rows are dropped rather than drawn as flat zeros.

Feed I/O time from anywhere, including background threads:

vor::record_io(elapsed_ns, bytes);   // lock-free accumulator

Sinks (offline traces)

Install a sink once at startup, then drop the returned guard to flush.

// macOS. Open the output in chrome://tracing or Perfetto.
use vor::{ChromeTraceSink, Sink};
let guard = ChromeTraceSink { path: "trace.json".into() }.install();
// Web. Spans show up in the DevTools Performance tab.
use vor::{BrowserSink, Sink};
let guard = BrowserSink.install();

NVIDIA (cuda)

The cuda feature does two independent things on NVIDIA hardware:

  • Fills the panel's gpu_util, pcie, and gpu_power rows from NVML, the same way mac fills them from IOReport.
  • Opens an NVTX range per scope, so your instrumented code lines up on an Nsight Systems timeline next to CUDA and GPU work. No code changes are needed: the same #[profile], #[all_functions], and profile_scope! carry over.

Neither needs a CUDA toolkit to build. nvtx vendors its headers and compiles them with cc; nvml-wrapper loads libnvidia-ml from the driver at runtime, so the GPU rows populate on any machine with an NVIDIA driver installed.

Other utilities

  • FrameStats: an HDR histogram of per-frame nanoseconds, with p50_ns, p95_ns, p99_ns, and mean_ns.
  • calibrate() and empty_span_ns(): measure the per-span instrumentation overhead so you can subtract it.
  • current_memory_bytes(): process memory on supported platforms.

Examples

examples/custom_metrics.rs is headless and shows the API shape (#[profile], #[all_functions], caller-defined metrics, the PanelState loop):

cargo run --features viz --example custom_metrics

examples/live_panel.rs opens a window and renders the live panel, so it doubles as an end-to-end check of each platform backend. Pick the feature set for the machine you are on:

# macOS (Apple Silicon): live gpu_util / gpu_sm / gpu_power via IOKit + IOReport
cargo run --example live_panel --features viz,mac

# NVIDIA box: live gpu_util / pcie / gpu_power via NVML, plus NVTX ranges
cargo run --example live_panel --features viz,cuda

# Web / browser: the standalone demo in web/ renders the panel in a canvas
cd examples/web && trunk serve --open   # needs: cargo install trunk; rustup target add wasm32-unknown-unknown

(examples/web/ is a minimal eframe + trunk app; GPU rows are absent in the browser, so it verifies the web build, the panel, and the DevTools timeline path.)

Run the GPU smoke tests directly (each asserts the backend returns sane readings; run on the matching machine):

cargo test --features viz,mac  poll_yields_sane_readings   # macOS
cargo test --features viz,cuda poll_yields_sane_readings   # NVIDIA host

License

Dual-licensed under MIT or Apache-2.0.