Skip to main content

Crate veloq_nsys_query

Crate veloq_nsys_query 

Source
Expand description

veloq-nsys-query — per-subcommand query implementations.

Each subcommand owns one module here. Phase 0 ships summary; stats, search, inspect, timeline, gaps, correlate follow.

Re-exports§

pub use error::NsysQueryError;
pub use error::NsysQueryResult;
pub use error::SqlPhase;
pub use event_ref::EventRef;
pub use event_ref::NvtxContext;
pub use kind_filter::KindFilter;
pub use row_id::EventKind;
pub use row_id::RowId;

Modules§

column_map
Schema-probe helpers shared by inspect and search.
concurrency
veloq concurrency <trace> — GPU kernel/transfer overlap extraction.
correlate
veloq correlate <row_id>... — single-event causal-chain reverse lookup.
docgen
Auto-generated reference-doc bodies.
error
event_ref
EventRef — the shared “row that references one trace event” shape every list-of-events response returns.
gaps
veloq gaps --min Nms — GPU idle-bubble detection.
graph_replays
veloq graph-replays <trace> — CUDA graph replay decomposition.
hardware
veloq hardware <trace> — CPU / GPU / NIC inventory.
inspect
veloq inspect <trace> <row_id> [<row_id> …] — full event details.
kind_filter
Type-safe “which event kinds is this request about” selector.
kind_policy
Shared request-validation policies for the kind-aware verbs (stats, search). Two silent-drop traps these policies prevent:
kind_sql
Per-event-kind SQL + label fragments shared across query commands.
metrics
veloq metrics — hardware-performance counter / CPU sample / scheduler-event queries.
ncu_command
Generate an Nsight Compute command for one CUDA kernel event.
nvtx_attribution
Shared NVTX→GPU attribution CTE.
nvtx_parent
Rank-and-pick-innermost NVTX parent attribution — SQL plumbing.
nvtx_projection
Shared NVTX→GPU-event projection CTE templates.
nvtx_reverse
Reverse NVTX attribution — “which NVTX range was this event launched inside?”.
row_id
Wire-format event identifiers.
search
veloq search <trace> ... — filter events into a list of row_ids plus a few headline columns. Designed as the inspect entry-point.
slices
veloq slices --pattern <glob> — NVTX-range attribution views.
stats
veloq stats <trace> — aggregated GPU work statistics.
stats_by_size
veloq stats --by size — bytes-as-aggregate-unit stats.
summary
veloq summary <trace> — one-shot overview of a trace.
timeline
veloq timeline <trace> --interval Nms — time-bucketed GPU activity.
viz_timeline
NSys static timeline SVG figure export.

Functions§

check_limit
Reject limit == 0 at the public-API boundary. The CLI also guards via CommonFilters::limit_or, but library callers can hand-build a request with limit: 0, which silently zeroes total_matched (the count comes off SQL rows that LIMIT 0 suppressed). Call this at the top of every run().
decode_global_tid
Decode an nsys globalTid into (pid, tid). NSys packs four fields into the 64-bit slot:
module_basename
NSys records modules as absolute paths (/usr/lib/x86_64-linux-gnu/libc.so.6) or Windows-style (C:\Windows\system32\foo.dll). For hotspot tables / callchains agents (and humans) want the basename — libc.so.6 / foo.dll. Centralised here so the metrics --type cpu-sampling path and inspect cpu_sample:N agree on what “module name” means without two copies of the slice-on-/ logic drifting.
open_scoped
Shared verb preamble: validate the limit, open the trace, and resolve the --from/--to window to an absolute (start_ns, end_ns). Used by the verbs whose run() opens with exactly this sequence (stats / search / stats_by_size). Verbs that interleave other validation between these steps — gaps--min check, timeline‘s --interval check, slices’ deferred window resolution — keep their own preamble so error precedence is unchanged.
parse_positive_duration
Parse a CLI duration flag (100us / 1.2s / 42ns / …) into ns, rejecting non-positive results. Wraps veloq_core::time::parse_duration_ns with a flag-name aware typed error and a “must be positive” guard. Used by every command that accepts a bucket/interval-like duration flag.