monitr 0.3.40 - Docs.rs

# monitr Audit: Activity/Resource Monitor Improvements

Created: 2026-06-10

## Executive summary
`monitr` is a solid baseline CLI/TUI monitor, but several gaps reduce its reliability as a production workflow tool. The highest-impact fixes are missing process-level network attribution, synchronous TUI detail loading, and history-pruning performance. This version expands each finding with implementation-ready detail so you can move directly into fixes.

## Findings (prioritized)

### 1) [x] [High] Missing per-process network attribution (core observability gap)
Why this matters: Without process-level RX/TX, operators cannot answer “which process is using bandwidth?” from the main monitor view and must switch tools.
Current behavior: Network metrics are aggregated at interface level in snapshot/table logic and are not consistently attributed to individual PIDs in the primary process view. `inspect`/`ports` can provide connection context, but not integrated per-process throughput in the normal rows/charts.
Why this is incomplete today: No canonical process-net metric path exists in the refresh model used by the main table and trend data.
Implementation details: inspect snapshot flow, then thread a process-level metric through state, table schema, and renderers. Add a best-effort backend path that attempts attribution and degrades gracefully when unsupported or permission-restricted.
Files to inspect first: `src/snapshot.rs`, `src/app.rs`, `src/ui.rs`, `src/inspect.rs`, `src/ports.rs`.
Trade-offs: attribution is platform-specific and can add per-refresh CPU cost, especially on systems with large socket tables.
Recommended rollout: add per-process deltas (bytes in/out + rates), include a `Net` metric in column config, and surface clearly when unavailable.
Implementation status: **complete**. Implemented per-process network totals/rates on macOS via `nettop` in `src/sampler.rs` (`platform::process_network_totals` + parser), propagated into `ProcessRow` as `network_in_rate`, `network_out_rate`, `total_network_in`, and `total_network_out`, surfaced in `src/ui.rs` (`Network` tab, compact/full schemas + sortable columns), and exposed in `src/output.rs` and `src/inspect.rs` via JSON/CLI fields.
Acceptance status: aggregate network totals remain in the UI and snapshot totals, while unsupported/errored process attribution now degrades gracefully with explicit support/error flags in totals (`process_network_supported`, `process_network_error`).

### 2) [ ] [High] Synchronous `lsof`/inspect hydration blocks TUI responsiveness
Why this matters: opening process detail can freeze the interface for noticeable time under large FD counts or slow command execution.
Current behavior: `toggle_handles` triggers `inspect::collect_handles(pid)` on the hot UI path; `collect_handles` shells out to `lsof` and may produce large output.
Why this is incomplete today: there is no async boundary or loading state for background enumeration, so the frame loop waits on external command latency.
Implementation details: make handle collection asynchronous per-process, cache recent results, and show loading/partial states. Add cancellation when selection changes and expiration policy for stale cache.
Files to inspect first: `src/app.rs` (selection state + event loop), `src/inspect.rs` (handle collection/render), `src/ui.rs` (detail panel timing).
Trade-offs: cache invalidation can show stale handles briefly; that is acceptable if clearly labeled and prioritized over UI lock-ups.
Recommended rollout: introduce a `HandleRequest` job queue with a result channel; show “loading handles…” placeholder on first open.
Suggested acceptance checks: repeatedly switch rows in under 1s and confirm no visible frame stalls; simulate slow `lsof` and confirm spinner/loading remains interactive.

### 3) [x] [High] O(n²) behavior in dead-process pruning can degrade at scale
Why this matters: the cleanup loop grows expensive as process count rises, making low intervals feel laggy during churn.
Current behavior: process history retention uses linear search over a vector of live PIDs for each stored PID while pruning dead entries.
Why this is incomplete today: as process count rises, behavior tends toward O(n²) and can compound at 1s+ refresh rates.
Implementation details: convert live PID list to a `HashSet` once per snapshot and use set membership for prunes. Keep history map type unchanged to avoid broad refactors.
Files to inspect first: `src/history.rs` in the record/update path.
Trade-offs: tiny extra allocation for the temporary set; usually far below runtime noise compared with reduced scan cost.
Recommended rollout: replace vector-membership checks with set-membership checks and keep behavior unchanged for callers.
Suggested acceptance checks: benchmark refresh duration at high process counts before/after and confirm no regression in memory behavior.
Implementation status: **complete**. Replaced the `Vec<u32>`-based live PID membership check in `History::record` (`src/history.rs`) with a `HashSet<u32>`, turning the dead-PID prune from O(n·m) into O(n + m) where `n` is the size of `process_cpu` and `m` is the live process count. The set is allocated with the snapshot's process count as its capacity hint so the extra allocation stays bounded. Public `History` API, call sites, and existing tests are unchanged; all 43 unit tests still pass.

### 4) [ ] [Medium] `inspect` one-shot output is less useful than the data model
Why this matters: CLI one-shot mode is often used for scripting and audits; missing fields weakens its practical value despite already collecting them.
Current behavior: `InspectProcess` contains richer fields (session/priority/open-files fields), but rendering omits several of them depending on mode and presentation path.
Why this is incomplete today: inconsistent feature parity creates a trust gap between interactive and non-interactive modes.
Implementation details: define canonical inspect fields and render all stable fields in one-shot output too. Keep UI-specific formatting separate from core data serialization to avoid drift.
Files to inspect first: `src/inspect.rs` (process model + rendering), any output formatter/flag handling in `src/main.rs`.
Trade-offs: output verbosity can increase significantly; provide optional compact mode or stable flags to limit fields.
Recommended rollout: add explicit `inspect` fields list in output docs, then keep defaults stable and sorted across modes.
Suggested acceptance checks: compare outputs for a known process with and without optional verbose mode; field set should be explainable and predictable.

### 5) [ ] [Medium] `lsof` dependency failures are fragile and non-actionable
Why this matters: users on minimal/locked systems may assume `monitr` is broken when command plumbing is missing or blocked.
Current behavior: several paths assume `lsof` is available and executable; certain command-not-found/permission errors are surfaced generically.
Why this is incomplete today: failure mode does not tell users what to install, grant, or how to disable heavy checks.
Implementation details: add preflight command availability checks, path hints, and explicit handling for missing binary, permission denied, and unsupported platform cases.
Files to inspect first: `src/inspect.rs`, `src/ports.rs`, `src/main.rs` argument handling.
Trade-offs: adding checks is extra startup/feature logic but improves operability significantly.
Recommended rollout: on first lsof use, attempt `which/command` resolution once; cache result and provide actionable diagnostics in both CLI and TUI paths.
Suggested acceptance checks: run with `lsof` unavailable and verify readable error with remediation steps.

### 6) [ ] [Medium] Limited table customization and persisted preferences
Why this matters: operators currently carry repetitive mental overhead to hide/show columns or lock preferred sorting on each run.
Current behavior: sort order/columns are mostly runtime defaults, with no durable per-user schema/preferences.
Why this is incomplete today: not enough for recurring workflows, especially in long investigations.
Implementation details: persist user preferences (visible columns, default sort key, compact mode, maybe refresh interval) in a config file in a user config path. Apply on startup with migration-safe defaults.
Files to inspect first: `src/app.rs` state model, any existing config/env parsing in `src/main.rs` and startup code.
Trade-offs: config migration and corruption handling must be robust; include fallback defaults.
Recommended rollout: define a small preference schema with version field, load-save on change, and provide a “reset defaults” command.
Suggested acceptance checks: start monitr twice and confirm persisted preferences are reapplied.

### 7) [x] [Low] Data-model inconsistency across command modes
Why this matters: inconsistent field sets and timing across modes make debugging and alerting hard to reason about.
Current behavior: CLI one-shot, inspect pane, and table mode can diverge in both which fields are visible and refresh semantics.
Why this is incomplete today: no canonical source-of-truth schema for metric fields.
Implementation details: establish a shared metric/schema layer, then add render adapters for each mode instead of independent ad-hoc mappings.
Status: implemented.
How:
- Added `src/process_record.rs` containing the canonical `ProcessRecord` schema for shared process metrics and metadata.
- Switched inspect output model to alias `InspectProcess` to `ProcessRecord`, then convert `ProcessRow` via `ProcessRecord::from`.
- Updated snapshot JSON serialization (`src/output.rs`) to flatten `ProcessRecord` and append snapshot-specific trend/delta fields.
- Kept `ProcessRow` as the internal live sampler model and introduced a strict adapter boundary so command modes read from a single canonical process schema.
Files to inspect first: `src/main.rs`, `src/app.rs`, `src/ui.rs`, `src/inspect.rs`.
Trade-offs: initial refactor touches multiple modules; keep incremental (start with a shared enum/struct and conversion methods).
Recommended rollout: map each public field to one source definition, then unit test expected mode output for a fixed fixture.
Suggested acceptance checks: document and test per-mode support matrix (required/optional/unsupported).

## Potentially low-value / candidates for removal or de-emphasis
Any command mode that only exposes sparse metadata should be expanded or consolidated. If `ports` stays informational-only, explicitly mark it in help text and consider adding a dedicated flag for heavy system scans.

## Nice-to-have improvements
Add smoke tests for expensive system-command paths (e.g., missing/blocked `lsof`). Keep a short behavior rationale in docs for changes around parsing, selection, and refresh so UX deltas stay understandable.

## Completion tracking
Legend:
- [ ] not started
- [ ] in progress
- [x] complete

1. [x] Missing per-process network attribution
2. [ ] Async inspect handle hydration in TUI
3. [x] O(n²) history pruning optimization
4. [ ] One-shot inspect output parity with TUI fields
5. [ ] Better `lsof` error handling and diagnostics
6. [ ] Persisted UI/table preferences
7. [x] Canonical metric schema across command modes

## Changelog
Use this section to record what was implemented and why.

- 2026-06-10 — Initial findings expanded and formatted; completion checkboxes and changelog section added.
- 2026-06-10 — Implemented finding #1: process-level network rates are collected when supported, exposed in process rows (full/compact), and included in snapshot/inspect outputs with trend deltas and totals.
- [ ] 2026-06-10 — Pending: implement finding #2
- 2026-06-10 — Implemented finding #3: replaced the per-snapshot `Vec<u32>` live-PID membership check in `History::record` with a `HashSet<u32>`, making dead-PID pruning linear in the number of live processes rather than quadratic. Public API, call sites, and tests are unchanged; all 43 unit tests pass.
- [ ] 2026-06-10 — Pending: implement finding #4
- [ ] 2026-06-10 — Pending: implement finding #5
- [ ] 2026-06-10 — Pending: implement finding #6
- 2026-06-10 — Implemented finding #7: introduced a shared `ProcessRecord` process schema and used it across inspect/snapshot output paths to remove ad-hoc field divergence.