Expand description
veloq-nsys-query — per-subcommand query implementations.
Each subcommand owns one module here. Phase 0 ships summary;
stats, search, inspect, timeline, gaps, correlate follow.
Re-exports§
pub use error::NsysQueryError;pub use error::NsysQueryResult;pub use error::SqlPhase;pub use event_ref::EventRef;pub use event_ref::NvtxContext;pub use kind_filter::KindFilter;pub use row_id::EventKind;pub use row_id::RowId;
Modules§
- column_
map - Schema-probe helpers shared by
inspectandsearch. - concurrency
veloq concurrency <trace>— GPU kernel/transfer overlap extraction.- correlate
veloq correlate <row_id>...— single-event causal-chain reverse lookup.- docgen
- Auto-generated reference-doc bodies.
- error
- event_
ref EventRef— the shared “row that references one trace event” shape every list-of-events response returns.- gaps
veloq gaps --min Nms— GPU idle-bubble detection.- graph_
replays veloq graph-replays <trace>— CUDA graph replay decomposition.- hardware
veloq hardware <trace>— CPU / GPU / NIC inventory.- inspect
veloq inspect <trace> <row_id> [<row_id> …]— full event details.- kind_
filter - Type-safe “which event kinds is this request about” selector.
- kind_
policy - Shared request-validation policies for the kind-aware verbs
(
stats,search). Two silent-drop traps these policies prevent: - kind_
sql - Per-event-kind SQL + label fragments shared across query commands.
- metrics
veloq metrics— hardware-performance counter / CPU sample / scheduler-event queries.- ncu_
command - Generate an Nsight Compute command for one CUDA kernel event.
- nvtx_
attribution - Shared NVTX→GPU attribution CTE.
- nvtx_
parent - Rank-and-pick-innermost NVTX parent attribution — SQL plumbing.
- nvtx_
projection - Shared NVTX→GPU-event projection CTE templates.
- nvtx_
reverse - Reverse NVTX attribution — “which NVTX range was this event launched inside?”.
- row_id
- Wire-format event identifiers.
- search
veloq search <trace> ...— filter events into a list ofrow_ids plus a few headline columns. Designed as theinspectentry-point.- slices
veloq slices --pattern <glob>— NVTX-range attribution views.- stats
veloq stats <trace>— aggregated GPU work statistics.- stats_
by_ size veloq stats --by size— bytes-as-aggregate-unit stats.- summary
veloq summary <trace>— one-shot overview of a trace.- timeline
veloq timeline <trace> --interval Nms— time-bucketed GPU activity.- viz_
timeline - NSys static timeline SVG figure export.
Functions§
- check_
limit - Reject
limit == 0at the public-API boundary. The CLI also guards viaCommonFilters::limit_or, but library callers can hand-build a request withlimit: 0, which silently zeroestotal_matched(the count comes off SQL rows that LIMIT 0 suppressed). Call this at the top of everyrun(). - decode_
global_ tid - Decode an nsys
globalTidinto(pid, tid). NSys packs four fields into the 64-bit slot: - module_
basename - NSys records modules as absolute paths
(
/usr/lib/x86_64-linux-gnu/libc.so.6) or Windows-style (C:\Windows\system32\foo.dll). For hotspot tables / callchains agents (and humans) want the basename —libc.so.6/foo.dll. Centralised here so themetrics --type cpu-samplingpath andinspect cpu_sample:Nagree on what “module name” means without two copies of the slice-on-/logic drifting. - open_
scoped - Shared verb preamble: validate the limit, open the trace, and resolve
the
--from/--towindow to an absolute(start_ns, end_ns). Used by the verbs whoserun()opens with exactly this sequence (stats/search/stats_by_size). Verbs that interleave other validation between these steps —gaps’--mincheck,timeline‘s--intervalcheck,slices’ deferred window resolution — keep their own preamble so error precedence is unchanged. - parse_
positive_ duration - Parse a CLI duration flag (
100us/1.2s/42ns/ …) into ns, rejecting non-positive results. Wrapsveloq_core::time::parse_duration_nswith a flag-name aware typed error and a “must be positive” guard. Used by every command that accepts a bucket/interval-like duration flag.