vor 0.1.0

Cross-platform performance instrumentation with an in-app egui panel and live system and GPU metrics.
Documentation
# vor

Opinionated cross-platform performance instrumentation for Rust. It does both halves of the job: measuring your code and showing the numbers live, inside the running app, instead of writing a trace you open somewhere else.

Annotate a function and the same scope goes to a puffin flame chart, a `tracing` span, and (with the `cuda` feature) an NVTX range. With the `viz` feature, vor also draws an egui panel with that flame chart, frame-rate bars, and live system and GPU metrics.

## Highlights

- Macros for functions, methods, and whole `impl` blocks: `#[profile]`, `#[all_functions]`, `#[skip]`. `const fn`s are left alone.
- The same annotations work on native macOS, web/wasm, and NVIDIA (NVTX).
- An egui panel with frame bars, a puffin flame chart, and one line plot per metric, with pin, pause, range-select, and zoom.
- System metrics sampled for you every frame: frame time, resident memory, and per-frame I/O.
- Live GPU metrics in the panel: Apple Silicon via IOKit and IOReport (no `sudo`), NVIDIA via NVML.
- Sinks that write a Chrome trace on native or push to the browser DevTools timeline on web.

## Install

vor is feature-gated, so pull in only what your platform needs.

```toml
[dependencies]
vor = { git = "https://github.com/SConsul/vor", features = ["viz", "mac"] }
```

| feature   | adds                                                                                     |
| --------- | ---------------------------------------------------------------------------------------- |
| *(none)*  | instrumentation macros plus puffin/tracing scopes (no cost until `enable()`)             |
| `viz`     | the egui profiler panel (`vor::viz`)                                                 |
| `mac`     | macOS: `ChromeTraceSink`, resident-memory sampling, and the IOKit/IOReport GPU collector |
| `web`     | wasm: `BrowserSink` (DevTools User Timing), JS-heap memory, browser-safe puffin          |
| `cuda`    | NVIDIA: live GPU rows via NVML, plus an NVTX range per scope for Nsight Systems           |

These features are independent; combine them as needed, for example `["viz", "mac", "cuda"]`.

## Instrumenting code

```rust
// A single function or method.
#[vor::profile]
fn render(frame: u32) { /* ... */ }

// Every method in an impl. Scopes are named Renderer::sort,
// Renderer::shade, and so on, with no per-method attribute.
struct Renderer { /* ... */ }

#[vor::all_functions]
impl Renderer {
    fn sort(&self)  { /* ... */ }
    fn shade(&self) { /* ... */ }

    // Keep a hot trivial helper out of the flame chart.
    #[vor::skip]
    fn dirty(&self) -> bool { /* ... */ }
}

// An ad-hoc block scope.
fn step() {
    vor::profile_scope!("expensive_part");
    /* ... */
}
```

Turn collection on once, and mark a boundary per rendered frame:

```rust
fn main() {
    vor::enable();          // switch puffin scope collection on
    loop {
        // ... your frame ...
        vor::frame_mark();  // group scopes into this frame
    }
}
```

Until `enable()` is called the puffin half does nothing. The `tracing` half is always live for whatever subscriber you install.

## The in-app panel (`viz`)

vor owns the system rows (`frame_ms`, `memory_mb`, `io_ms`, `io_MB`, and `gpu_*` where supported). You describe only your own per-frame workload.

```rust
use std::collections::VecDeque;
use vor::viz::{Metric, PanelConfig, PanelState, show};

#[derive(Clone, Copy)]
struct AppFrame { visible: u32 }

const fn visible_of(f: &AppFrame) -> f64 { f.visible as f64 }
const METRICS: &[Metric<AppFrame>] =
    &[Metric::new("visible", visible_of, "splats").as_integer()];

let mut state = PanelState::new(PanelConfig::FRAME_MS);
let cap = PanelConfig::FRAME_MS.history_capacity;
let mut history: VecDeque<AppFrame> = VecDeque::with_capacity(cap);

// Once per displayed frame, inside your egui update. Skip the tick
// and the push while paused so every graph freezes together instead
// of scrolling under the pinned cursor:
if !state.is_paused() {
    state.tick();                              // sample system metrics, mark a puffin frame
    if history.len() >= cap { history.pop_front(); }
    history.push_back(AppFrame { visible: 1_500_000 });
}
show(ui, &mut state, &history, METRICS);       // draw the panel
```

`PanelState::tick()` advances vor's own system ring. Push one workload record per `tick` so the two stay aligned, and gate both on `is_paused()` as above.

### Panel interactions

The bars and every metric plot share one time axis: a pin, a zoom range, and pause apply to all of them at once.

| action                      | effect                                                          |
| --------------------------- | --------------------------------------------------------------- |
| click a frame bar           | pin the cursor on that frame (all graphs) and pause             |
| shift-drag the bars         | zoom every graph to that frame range (pins the slowest frame)   |
| pause/resume button         | freeze / follow the live stream (`PanelState::toggle_pause`)    |
| scroll over the flame chart | zoom the flame chart's within-frame time; drag pans, double-click resets |
| profiler chip               | annotate `frame_ms` with vor's own per-frame cost           |

## System and GPU metrics

vor samples these itself on each `tick()`:

| metric           | source                                                       | platforms      |
| ---------------- | ------------------------------------------------------------ | -------------- |
| `frame_ms`       | wall time between ticks                                      | all            |
| `memory_mb`      | RSS on `mac`, `performance.memory` on `web` (Chromium)       | `mac`, `web`   |
| `io_ms`, `io_MB` | your `record_io(ns, bytes)` calls, drained per frame         | all            |
| `gpu_util`       | IOKit `IOAccelerator` on `mac`, NVML utilization on `cuda`   | `mac`, `cuda`  |
| `gpu_sm`         | IOKit `IOAccelerator` renderer utilization                   | `mac`          |
| `gpu_power`      | IOReport `GPU Energy` on `mac`, NVML power draw on `cuda`     | `mac`, `cuda`  |
| `pcie`           | NVML PCIe TX+RX                                              | `cuda`         |
| `gpu_mem`        | IOKit `IOAccelerator` in-use memory on `mac`, NVML used on `cuda` | `mac`, `cuda` |
| `gpu_temp`       | NVML core temperature                                        | `cuda`         |
| `gpu_clock`      | NVML SM clock                                                | `cuda`         |

A background thread the panel starts polls the GPU backend (`mac` or `cuda`, no `sudo`) and the rows show only metrics that backend supplies: `gpu_sm` is macOS-only (NVML has no SM-occupancy counter), while `pcie`, `gpu_temp`, and `gpu_clock` are NVIDIA-only (the macOS backend doesn't read them). On a platform with no backend, including the browser (which gives a web page no GPU-telemetry API), the GPU rows are dropped rather than drawn as flat zeros.

Feed I/O time from anywhere, including background threads:

```rust
vor::record_io(elapsed_ns, bytes);   // lock-free accumulator
```

## Sinks (offline traces)

Install a sink once at startup, then drop the returned guard to flush.

```rust
// macOS. Open the output in chrome://tracing or Perfetto.
use vor::{ChromeTraceSink, Sink};
let guard = ChromeTraceSink { path: "trace.json".into() }.install();
```

```rust
// Web. Spans show up in the DevTools Performance tab.
use vor::{BrowserSink, Sink};
let guard = BrowserSink.install();
```

## NVIDIA (`cuda`)

The `cuda` feature does two independent things on NVIDIA hardware:

- Fills the panel's `gpu_util`, `pcie`, and `gpu_power` rows from [NVML]https://crates.io/crates/nvml-wrapper, the same way `mac` fills them from IOReport.
- Opens an [NVTX]https://github.com/NVIDIA/NVTX range per scope, so your instrumented code lines up on an Nsight Systems timeline next to CUDA and GPU work. No code changes are needed: the same `#[profile]`, `#[all_functions]`, and `profile_scope!` carry over.

Neither needs a CUDA toolkit to build. `nvtx` vendors its headers and compiles them with `cc`; `nvml-wrapper` loads `libnvidia-ml` from the driver at runtime, so the GPU rows populate on any machine with an NVIDIA driver installed.

## Other utilities

- `FrameStats`: an HDR histogram of per-frame nanoseconds, with `p50_ns`, `p95_ns`, `p99_ns`, and `mean_ns`.
- `calibrate()` and `empty_span_ns()`: measure the per-span instrumentation overhead so you can subtract it.
- `current_memory_bytes()`: process memory on supported platforms.

## Examples

`examples/custom_metrics.rs` is headless and shows the API shape (`#[profile]`,
`#[all_functions]`, caller-defined metrics, the `PanelState` loop):

```sh
cargo run --features viz --example custom_metrics
```

`examples/live_panel.rs` opens a window and renders the live panel, so it doubles
as an end-to-end check of each platform backend. Pick the feature set for the
machine you are on:

```sh
# macOS (Apple Silicon): live gpu_util / gpu_sm / gpu_power via IOKit + IOReport
cargo run --example live_panel --features viz,mac

# NVIDIA box: live gpu_util / pcie / gpu_power via NVML, plus NVTX ranges
cargo run --example live_panel --features viz,cuda

# Web / browser: the standalone demo in web/ renders the panel in a canvas
cd examples/web && trunk serve --open   # needs: cargo install trunk; rustup target add wasm32-unknown-unknown
```

(`examples/web/` is a minimal `eframe` + trunk app; GPU rows are absent in the browser,
so it verifies the `web` build, the panel, and the DevTools timeline path.)

Run the GPU smoke tests directly (each asserts the backend returns sane readings;
run on the matching machine):

```sh
cargo test --features viz,mac  poll_yields_sane_readings   # macOS
cargo test --features viz,cuda poll_yields_sane_readings   # NVIDIA host
```

## License

Dual-licensed under MIT or Apache-2.0.