resource-tracker 0.1.6

Lightweight Linux resource and GPU tracker for system and process monitoring.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
# Changelog

## [0.1.5] - 2026-05-01

### Two new cloud providers and cloud discovery refactor

#### `src/collector/clouds/` -- cloud discovery split into per-provider modules

- Cloud discovery code extracted from `src/collector/host.rs` into a dedicated
  `src/collector/clouds/` module hierarchy. `host.rs` now contains only host
  hardware metrics (CPU, memory, storage, hostname, IP).
- Each cloud provider is a standalone module exposing one public symbol:
  `pub fn probe() -> Option<CloudInfo>`. Modules: `aws.rs`, `gcp.rs`,
  `azure.rs`, `hetzner.rs`, `upcloud.rs`, `alicloud.rs`, `ovh.rs`.
- Probe orchestration in `clouds/mod.rs` uses a `const PROBES: &[fn() ->
  Option<CloudInfo>]` slice. Adding a new cloud provider requires one new
  file and one line in `PROBES`; no other file changes.
- Shared IMDS helpers (`new_imds_agent`, `imds_get`, `imds_get_headers`) live
  in `clouds/mod.rs` and are accessible to all provider submodules.
- `spawn_cloud_discovery` re-exported from `collector::clouds`; call sites in
  `collector::mod` and `main.rs` are unchanged.
- GCP zone-to-region helper renamed from `gcp_zone_basename_to_region` to
  `zone_to_region` (scoped to `clouds/gcp.rs`) and its test moved accordingly
  to `clouds::gcp::tests::test_zone_to_region`.

#### `src/collector/clouds/alicloud.rs` -- Alibaba Cloud ECS (new)

- IMDSv2 token PUT to `100.100.100.200` with `X-aliyun-ecs-metadata-token`
  header; IMDSv1 plain GET fallback.
- Collects `instance/instance-type` and `region-id`; filters `"unknown"` values.
- Reference: Alibaba Cloud ECS instance metadata documentation.

#### `src/collector/clouds/ovh.rs` -- OVH Public Cloud (new)

- Identified by DNS fingerprint: checks for `213.186.33.99` (OVH resolver)
  in `network_data.json` from the OpenStack metadata endpoint.
- Region from `availability_zone` in `meta_data.json`; filters `"nova"` and
  `"unknown"` as meaningless values.
- Instance type from the EC2-compatible endpoint OVH also exposes.
- Reference: OpenStack Nova metadata API.

#### `src/collector/clouds/aws.rs` -- domain guard added

- After IMDSv2/IMDSv1 reachability is confirmed, probes
  `/latest/meta-data/services/domain` and returns `None` unless the value is
  `"amazonaws.com"`. Prevents other clouds exposing an EC2-compatible
  metadata service from being misidentified as AWS.

#### Clippy fixes (pre-existing warnings cleared)

- `src/collector/gpu.rs`: `option_map_unit_fn` -- replaced `.map(|kib| { ... })`
  with `if let Some(kib) = ...`.
- `src/sentinel/run.rs`: `manual_is_multiple_of` and `manual_range_contains` --
  leap-year arithmetic and month/day bounds checks use `.is_multiple_of()` and
  `(1..=N).contains()`.
- `src/sentinel/s3.rs`: `manual_split_once` -- `splitn(2, ':').nth(1)` replaced
  with `split_once(':').map(|x| x.1)`. `too_many_arguments` on `sign_put_request`
  suppressed with `#[allow]` (all 9 parameters are required by AWS SigV4).
- `src/sentinel/upload.rs`: `collapsible_if` -- nested credential-refresh guard
  collapsed using a let chain.

---

## [0.1.4] - 2026-04-24

- Small documentation fixes.
- Deploy documentation to GitHub Pages.
- Extend cloud discovery helpers based on the existing Python implementation.
- Release statically linked binaries for Linux.

## [0.1.3] - 2026-04-23

### Populate process_gpu_usage and handle SIGINT gracefully (2026-04-08)

#### `src/collector/gpu.rs` -- per-process GPU utilization for NVIDIA and AMD

- **`process_gpu_info` and `all_gpu_process_info` now return a 3-tuple**
  `(Option<f64>, Option<f64>, Option<u32>)` = `(vram_mib, usage_pct, gpu_utilized)`.
- **NVIDIA**: SM (shader/compute) utilization is sourced from
  `nvmlDeviceGetProcessUtilization` (`device.process_utilization_stats(0u64)`).
  The latest sample per PID is taken (highest timestamp) and summed across all
  matched PIDs and devices.  Does not require accounting mode.
- **AMD**: per-process GFX engine utilization is computed from
  `drm-engine-gfx` cumulative nanoseconds in `/proc/{pid}/fdinfo` using
  `libamdgpu_top::stat::FdInfoStat` delta tracking.  `FdInfoStat` is stored as
  persistent state on `GpuCollector` (field `amd_fdinfo: Option<FdInfoStat>`)
  and updated each polling interval.  `process_gpu_info` builds `ProcInfo` only
  for the tracked PIDs; `all_gpu_process_info` enumerates all GPU-using processes.
- **`GpuCollector`** gains an `amd_fdinfo` field and both methods now take
  `&mut self` and a `Duration` (the polling interval) for AMD delta computation.

#### `src/metrics/cpu.rs` -- new `process_gpu_usage` field

- Added `pub process_gpu_usage: Option<f64>` between `process_disk_write_bytes`
  and `process_gpu_vram_mib`.  Expressed as **fractional GPUs** (same convention
  as `process_cores_used`): 1.0 = one GPU fully utilized, 0.5 = half a GPU.
  Raw SM utilization (0-100) is divided by 100 before being stored.
  `None` on CPU-only hosts or when NVML/AMD data is unavailable.

#### `src/main.rs` -- wire new field; SIGINT handler

- Destructures the new 3-tuple from GPU calls and assigns `sample.cpu.process_gpu_usage`.
- `gpu` is now `let mut gpu` to satisfy `&mut self`.
- **SIGINT registered to the same handler as SIGTERM** so Ctrl-C triggers the
  existing graceful shutdown path (flush S3, call `/finish`, exit).  Both signals
  set `SIGTERM_RECEIVED`; the main loop's existing check handles both.
- Added test `test_sigint_sets_shutdown_flag` verifying SIGINT sets the flag.

#### `src/output/csv.rs` -- emit `process_gpu_usage` column

- Column 29 (`process_gpu_usage`) now emits `opt_f4(s.cpu.process_gpu_usage)`
  instead of the hardcoded empty string.
- Updated tests T-CSV-07 and T-CSV-08 to reflect the new behavior.

---

### Fix HTTP 422 on start_run and correct Sentinel API field names (2026-04-04)

#### `src/sentinel/run.rs` -- MetadataPayload command serialization and pid removal

- **Fixed HTTP 422 on start_run**: `command` was serialized as a JSON array
  (e.g. `["Rscript","stress.r"]`), but the `RunCreate` API schema declares
  `command` as `string | null` ("JSON array encoded in TEXT").  Pydantic rejects
  an array where a string is expected, causing every invocation with a wrapped
  command to return 422 and disable streaming.
- **`command` is now JSON-encoded**: pre-serialized with
  `serde_json::to_string(&metadata.command)` and sent as an `Option<String>`;
  `None` when no command was given.
- **`pid` removed from `MetadataPayload`**: the `RunCreate` schema has no `pid`
  field.  The parameter is retained in the `start_run` function signature (bound
  to `let _ = pid`) with a comment explaining the omission.

#### `src/metrics/host.rs` -- serde renames to match API field names

- **`host_name` serializes as `host_hostname`** (`#[serde(rename)]`) -- the API
  field is `host_hostname`, not `host_name`.
- **`host_allocation` serializes as `host_server_allocation`** -- the API field
  is `host_server_allocation`.
- **`host_gpu_vram_mib` serializes as `host_gpu_memory_mib`** -- the API field
  is `host_gpu_memory_mib`.
- Rust field names are unchanged; the renames only affect JSON serialization so
  all internal collector code and tests remain unmodified.

---

### Fix process_gpu_vram_mib and process_gpu_utilized empty without --pid (2026-04-04)

#### `src/collector/gpu.rs` -- new all_gpu_process_info() method

- **Added `all_gpu_process_info(&self) -> (Option<f64>, Option<u32>)`** -- aggregates
  GPU VRAM and utilized-device count across all running processes on the host with no
  PID filter.  NVIDIA path sums `used_gpu_memory` for every compute and graphics process
  across all devices; AMD path reads `mem_info_vram_used` from sysfs per device.
  Returns `(None, None)` on CPU-only hosts, matching `process_gpu_info` semantics.
- **Four new tests (T-GPU-A1 through T-GPU-A4)**:
  - `test_all_gpu_process_info_consistent` -- both fields are Some or both are None, never mixed.
  - `test_all_gpu_process_info_no_gpu_returns_none` -- CPU-only host returns `(None, None)`.
  - `test_all_gpu_process_info_gpu_host_returns_some` -- GPU host returns `Some` for both
    fields with `vram_mib >= 0.0`.
  - `test_all_gpu_process_info_ge_empty_pid_query` -- result is >= the zero-PID
    `process_gpu_info(&[])` value, confirming the no-filter path is strictly broader.

#### `src/main.rs` -- populate process_gpu columns without --pid

- **Removed the `config.pid.is_some()` guard** on the GPU augmentation block.
  Previously `process_gpu_vram_mib` and `process_gpu_utilized` were always empty in CSV
  output when running without `--pid`, even on GPU machines.
- **New branch**: when `config.pid` is `None`, calls `gpu.all_gpu_process_info()` to
  report system-wide GPU allocation; when `config.pid` is `Some`, the existing
  `process_gpu_info(&pids)` call is used unchanged.
- Note: the remaining `process_` columns (pid, children, utime, stime, cpu_usage,
  memory_mib, disk_read_bytes, disk_write_bytes, gpu_usage) require `--pid` to identify
  a tracked process tree and remain empty without it by design.

---

### Fix integration test, compare test, and enforce single-threaded test execution (2026-04-04)

#### `src/sentinel/run.rs` -- close_run and integration test

- **`close_run` now validates the HTTP status code** -- after a successful POST,
  the status is checked and `Err(...)` is returned if it is not 200.  Previously
  any non-error ureq response silently passed.
- **Integration test `test_real_api_finish_run_returns_ok` rewritten** -- replaces
  the hand-crafted 3-column CSV (wrong column names causing 422) with
  `samples_to_csv(&[sample], 1)` using a proper `Sample` whose column names match
  the API schema.  Adds `eprintln!` diagnostics at each step.  Token is read from
  `SENTINEL_API_TOKEN` env var; test skips when absent and runs automatically when
  the token is present.  Confirmed against the live API: 200 received.

#### `tests/compare.rs` -- Python vs Rust numeric comparison test

- **`parse_csv` no longer panics on an empty file** -- returns empty `CsvData`
  instead of panicking with "CSV has no header" when a collector produced no output.
- **Empty-output case now skips gracefully** -- prints a `SKIP:` message and returns
  when Python or Rust produced no rows (e.g. uv startup exceeded the wall-clock cap).
- **Rust collector now uses `--output`** -- the binary writes CSV to a file via
  `--output`; the test previously captured stdout but the binary emits to stderr by
  default (or to a file when `--output` is given).

#### `.cargo/config.toml` -- enforce single-threaded test execution

- Added `.cargo/config.toml` with `[test] threads = 1`.  Mock TCP server tests bind
  ephemeral ports and have race conditions under parallel execution; this setting is
  equivalent to `--test-threads=1` and only affects `cargo test`.

---

### Fix /finish endpoint payload -- raw CSV, finished_at, spec-driven tests (2026-04-03)

#### `src/sentinel/run.rs` -- close_run now sends a spec-compliant RunFinishInline payload

- **Bug fix: `data_csv` was base64-encoded** -- the `RunFinishInline` schema specifies
  `data_csv` as a plain `string` ("Raw CSV string containing the metrics data"), not
  base64.  Sending base64 caused a 422 Validation Error from the API.  The
  `base64_encode()` call is removed; the raw CSV string is now passed through directly.
- **Added `finished_at`** -- `CloseRunRequest` now includes `finished_at: Option<String>`
  (ISO 8601 UTC, e.g. `"2026-04-03T12:00:00Z"`).  Populated via a new `now_iso8601()`
  helper.  Omitted when explicitly set to `None` via `skip_serializing_if`.
- **Added `unix_secs_to_iso8601(secs: u64) -> String`** and `now_iso8601() -> String`
  helper functions using the same calendar math as the existing `days_since_epoch`.
- `base64_encode` moved to `#[cfg(test)]` since it is now only used by its own tests.
- Removed incorrect doc comment claiming data_csv is base64-encoded.

#### New spec-driven tests (T-FIN-01 through T-FIN-07)

All tests use a local mock TCP server to capture the exact bytes `close_run` sends,
then assert on the JSON body:

- **T-FIN-01** `test_close_run_run_status_finished_for_zero_exit` -- `run_status` is
  `"finished"` and `exit_code` is 0 when called with `exit_code = Some(0)`.
- **T-FIN-02** `test_close_run_run_status_finished_for_sigterm` -- `run_status` is
  `"finished"` and `exit_code` is omitted when called with `exit_code = None` (SIGTERM).
- **T-FIN-03** `test_close_run_run_status_failed_for_nonzero_exit` -- `run_status` is
  `"failed"` for exit codes 1, 2, 127, 130, 255.
- **T-FIN-04** `test_close_run_data_csv_is_raw_csv_not_base64` -- `data_csv` contains
  raw CSV content (commas, column headers, numeric values) and not base64 gibberish.
- **T-FIN-05** `test_close_run_finished_at_is_valid_iso8601` -- `finished_at` is present,
  ends with `Z`, parses as ISO 8601, and is within 60 seconds of now.
- **T-FIN-06** `test_close_run_handles_valid_run_finish_response` -- function returns
  `Ok(())` when the server replies with a valid `RunFinishResponse` JSON.
- **T-FIN-07** `test_close_run_no_extra_fields_in_payload` -- the JSON object contains
  only fields allowed by `RunFinishInline` (`additionalProperties: false`).

Additional helpers/tests:
- `test_unix_secs_to_iso8601_known_values` -- epoch, Y2K, and a round-trip via
  `parse_iso8601_secs`.
- `test_unix_secs_to_iso8601_leap_day` -- `2000-02-29` round-trips correctly.
- `test_now_iso8601_parses` -- `now_iso8601()` output is non-empty, ends with `Z`,
  and parses back to a Unix timestamp.
- `test_close_run_finished_at_omitted_when_none` (T-EOR-06) -- `finished_at` key is
  absent from the JSON when the field is `None`.
- Updated T-EOR-02 and T-EOR-03 to use raw CSV strings in `data_csv` (not the old
  fake base64 strings) and to include the `finished_at` field in the struct literal.

---

### CLI ordering fix + command field in start_run payload (2026-04-03)

#### `src/config.rs` -- --job-name moved to metadata section
- `job_name` field moved in `Cli` struct from the "Core flags" section to the
  metadata section, between `project_name` and `stage_name`.  The `-n` shorthand
  is retained.  This fixes `--help` display order so `--job-name` appears
  naturally between `--project-name` and `--stage-name`.
- Added `command: Vec<String>` field to `JobMetadata`.  Populated from
  `cli.command` in `Config::load()` so the shell-wrapper command is available
  for API registration without requiring a separate parameter thread.

#### `src/sentinel/run.rs` -- command array in /runs payload
- Added `command: &'a [String]` field to `MetadataPayload` with
  `#[serde(skip_serializing_if = "slice_is_empty")]`.
- `start_run` now sends the wrapped command as a JSON array in the POST body,
  e.g. `"command":["stress","--cpu","4","--timeout","63s"]`.
  The field is absent when not in shell-wrapper mode (empty slice).

---

### Unit test coverage: 41.98% -> 80.24% (2026-04-02)

Added unit tests across all collector and sentinel modules to bring
`cargo llvm-cov --bins` line coverage from 41.98% to 80.24% (91 tests total).

#### New test modules added

- **`collector/memory.rs`** (was 0%): 5 tests for `MemoryCollector::collect()` --
  total_mib > 0, used_pct in 0..100, free_mib <= total_mib, swap consistency,
  repeatability.
- **`collector/network.rs`** (was 0%): 4 tests -- first-call rates 0.0, second-call
  rates >= 0.0, no loopback / sorted, cumulative totals non-decreasing.
- **`collector/host.rs`** (was 0%): 6 tests -- no-GPU returns None GPU fields,
  one/two GPU field population and VRAM summing, hostname non-empty, vcpus > 0,
  `spawn_cloud_discovery` joins without panic.
- **`collector/disk.rs`** (was 24%): 5 new tests -- first-call rates 0.0,
  second-call rates >= 0.0, sorted by device, totals non-decreasing,
  `read_device_info` non-existent device returns all-None fields.
- **`collector/cpu.rs`** (was 31%): 6 new tests -- PID-tracking produces Some for
  all process fields, `process_tree_rss_mib` > 0 for self, `process_tree_ticks`
  contains root PID, second collect >= 0 cores, no-PID second collect all None,
  process_count > 0.
- **`collector/gpu.rs`** (was 14%): 4 new tests -- `collect()` returns Ok,
  identity fields non-empty, utilization_pct in 0..100, vram_used <= vram_total.

#### Expanded test coverage in existing modules

- **`config.rs`** (was 0%): 5 tests -- TOML deserialization, `TomlConfig::default()`,
  `JobMetadata::default()`, `OutputFormat` equality, unknown-key handling.
- **`sentinel/mod.rs`** (was 65%, now 100%): 2 new tests -- valid token returns
  Some with correct defaults, `SENTINEL_API_URL` env var overrides base URL.
- **`sentinel/run.rs`** (was 73%, now 95%): 8 new tests -- `base64_encode` RFC 4648
  vectors and round-trip, `days_since_epoch` invalid inputs, `parse_iso8601_secs`
  UTC offset and too-few-components, `slice_is_empty` helper, `refresh_credentials`
  mock-server test, `start_run` mock-server test.
- **`sentinel/upload.rs`** (was 63%, now 89%): 2 new tests -- non-empty batch with
  invalid URI exercises CSV/gzip path then exits on shutdown; S3-failure path
  exercises retry logic (note: takes ~7 s due to 2+4 s retry back-off sleeps).

#### Uncovered lines (not achievable with unit tests)

- `main.rs` (171 lines, 0%): binary entry point; covered by smoke tests.
- `collector/gpu.rs` AMD+NVML paths (221 lines): require physical GPU hardware.
- `config.rs` `Config::load()` (45 lines): uses `clap::Parser::parse()` which
  reads `std::env::args()` and rejects test-runner flags.

---

### close_run 422 fix + upload thread shutdown delay fix (2026-04-03)

#### `src/sentinel/run.rs` -- /finish endpoint body shape corrected
- Removed `run_id` from `CloseRunRequest` body; it belongs in the URL path
  (`/runs/{run_id}/finish`) only.  Sending it in the body caused a 422.
- Removed `DataSource::S3` variant from close_run.  The /finish endpoint does
  not accept `data_source: "s3"`; S3 batches uploaded during the run are
  already associated with the run server-side by run_id.
- `close_run` now always sends `data_source: "inline"` with base64-encoded
  remaining (unflushed) samples as `data_csv`.
- Removed `uploaded_uris` parameter from `close_run` (no longer used in body).
- Tests updated: `test_close_run_request_omits_run_id`,
  `test_close_run_data_source_inline` (replaces previous s3 variant tests).
- New test `test_close_run_posts_to_finish_endpoint`: mock TCP server captures
  the raw HTTP request and asserts URL contains run_id, body omits run_id,
  `data_source=inline`, no `s3` field, `data_csv` present, correct run_status
  and exit_code.

#### `src/sentinel/upload.rs` -- upload thread shuts down within 250 ms
- Replaced `std::thread::sleep(upload_interval)` with a `take_while` / `for_each`
  loop of 250 ms ticks that checks the shutdown flag on each tick.  Previously,
  a tracked app finishing before the upload interval elapsed (e.g. 63 s with a
  60 s interval) caused the resource-tracker to wait up to 60 s for the thread
  to wake before exiting.  Now it exits within ~250 ms of the flag being set.
- Thread return type changed from `JoinHandle<Vec<String>>` to `JoinHandle<()>`;
  uploaded URIs are no longer returned (they are not sent to /finish).
- New test `test_upload_thread_shuts_down_promptly`: spawns uploader with a 60 s
  interval, sets the shutdown flag immediately, asserts join completes in < 2 s.

#### `src/main.rs` -- shutdown() updated
- `upload_handle` type updated to `JoinHandle<()>`; join result discarded.
- Removed `uploaded_uris` from `close_run` call.

---

### --quiet / --output flags + output routing tests (2026-04-02)

#### `src/config.rs` -- new output control flags
- Added `--output FILE` / `-o` / `TRACKER_OUTPUT` env var: redirect metric output
  to a file instead of stdout. Useful in shell-wrapper mode to keep the tracked
  app's stdout clean.
- Added `--quiet` / `TRACKER_QUIET` env var: suppress all metric output (no stdout,
  no file). Useful when streaming to Sentinel and local output is not needed.
- Added `output_file: Option<String>` and `quiet: bool` to `Config`.

#### `src/main.rs` -- `emit!` macro + `BufWriter` output sink
- Added `let mut out_file: Option<BufWriter<File>>` to select the output sink at
  startup: `None` when `--quiet`, `Some(file)` when `--output FILE`, else writes
  to stdout via `println!`.
- Added `emit!` macro that routes formatted output to the chosen sink, calling
  `flush()` after each write so `tail -f` works on the output file.
- All metric output (`println!` calls in the sampling loop) replaced with `emit!`.

#### `tests/smoke.rs` -- 6 new output-sink tests
- `test_quiet_produces_no_stdout`: `--quiet` produces no stdout lines.
- `test_no_quiet_produces_stdout`: control -- normal mode does produce output.
- `test_output_file_json`: `--output FILE` writes JSON to file; stdout is empty;
  file contains valid JSON.
- `test_output_file_csv`: `--output FILE --format csv` writes CSV header to file;
  stdout is empty.
- `test_tracker_quiet_env_var`: `TRACKER_QUIET=1` behaves identically to `--quiet`.
- `test_tracker_output_env_var`: `TRACKER_OUTPUT=path` behaves identically to `--output`.
- Added `run_for` / `run_for_with_env` helpers and `OUTPUT_TEST_WAIT` constant
  (3 s) to avoid the 10 s `collect_lines` timeout when testing empty stdout.

---

### CSV system_/process_ prefixes + close_run fixes (2026-04-02)

#### `src/output/csv.rs` -- system_ / process_ column prefixes
- All 21 system columns renamed with `system_` prefix; memory columns
  additionally carry explicit `_mib` suffix (e.g. `memory_free` ->
  `system_memory_free_mib`, `gpu_vram` -> `system_gpu_vram_mib`).
- 11 `process_` columns appended: `process_pid`, `process_children`,
  `process_utime`, `process_stime`, `process_cpu_usage`,
  `process_memory_mib`, `process_disk_read_bytes`, `process_disk_write_bytes`,
  `process_gpu_usage`, `process_gpu_vram_mib`, `process_gpu_utilized`.
- Populated fields: `process_pid` (`tracked_pid`), `process_children`
  (`cpu.process_child_count`), `process_cpu_usage` (`cpu.process_cores_used`).
  Remaining process fields emitted as empty strings (not yet collected).
- T-CSV-06 updated: empty trailing process fields are valid CSV nulls, not
  a formatting error; trailing-comma assertion removed from data row check.

#### `src/metrics/mod.rs` -- `tracked_pid` added to `Sample`
- New `tracked_pid: Option<i32>` field carries the root PID into the CSV
  serializer without requiring access to `Config`.

#### `tests/smoke.rs` -- column name renames
- `EXPECTED_HEADER` updated to new 32-column format.
- All `col("name")` lookups updated to use `system_`/`process_` prefixed names.

#### `tests/compare.rs` -- dual column name support
- Added `rs_name: &'static str` to `ColSpec`; Python CSV lookup uses `name`
  (unprefixed), Rust CSV lookup uses `rs_name` (`system_` prefixed).
  All 17 ColSpec entries updated.

#### `src/sentinel/run.rs` -- `close_run` body gzip reverted
- Removed `Content-Encoding: gzip` and body-level compression from
  `close_run` POST.  The Sentinel API (FastAPI) does not decompress
  gzip-encoded request bodies, causing a 422.
- Body is now plain JSON matching the Python reference `requests.post(url,
  json=payload)`.  `data_csv` remains plain base64 (no inner gzip).

#### `src/sentinel/s3.rs` -- S3 PUT header: Content-Encoding -> Content-Type
- Changed S3 PUT from `Content-Encoding: gzip` to `Content-Type: application/gzip`.
  `Content-Encoding: gzip` caused HTTP clients to transparently decompress
  the object on GET, making the `.csv.gz` file appear uncompressed.
  `Content-Type: application/gzip` stores the gzip bytes as-is.
- Test updated to assert `content-type: application/gzip`.

---

### Dependencies.md Cargo crate table added (2026-04-02)

#### `resource-tracker-rs-book/src/Dependencies.md` -- Rust crate dependencies section
- Added "Rust Crate Dependencies" section with two tables: runtime crates and dev dependencies.
- Each row lists crate name, pinned/constrained version, and the purpose / why it was chosen.
- Covers all 13 runtime crates (`nvml-wrapper`, `clap`, `procfs`, `ureq`, `serde`, `serde_json`,
  `toml`, `hmac`, `sha2`, `hex`, `libc`, `flate2`, `libamdgpu_top`) and 1 dev crate (`num_cpus`).

---

### Code fixes and test improvements (2026-04-02)

#### `src/sentinel/run.rs` -- `close_run` payload compression (bug fix)
- The entire JSON body sent to `/runs/{id}/finish` is now gzip-compressed with
  `Content-Encoding: gzip`, matching the Python reference and the S3 upload path.
- Previously only the `data_csv` field was gzip+base64 encoded while the outer
  HTTP body was sent uncompressed with no `Content-Encoding` header.
- `data_csv` is now plain base64 (no inner gzip) since the HTTP-level compression
  covers the whole body; matches Python `b64encode(data_csv)`.

#### `for_each` substitution (all `*.rs` files)
- Replaced `for` loops with `.for_each()` calls throughout `src/` and `tests/`
  wherever `break`, `continue`, and `return` are not used in the loop body.
- Loops containing `break`, `continue`, or early `return` (e.g. `host.rs:98`,
  `compare.rs:115`, `compare.rs:338`, `smoke.rs` helper loops) are left as `for`.

#### Test function naming (`src/**/*.rs`, `tests/*.rs`)
- All `#[test]` functions now carry a `test_` prefix (e.g. `fn creds_expiring_soon_far_future`
  → `fn test_creds_expiring_soon_far_future`).
- Affects `src/collector/cpu.rs`, `src/collector/disk.rs`, `src/output/csv.rs`,
  `src/sentinel/mod.rs`, `src/sentinel/run.rs`, `src/sentinel/s3.rs`,
  `src/sentinel/upload.rs`, `tests/smoke.rs`, `tests/compare.rs`.

#### `tests/smoke.rs` -- `test_sigterm_exits_zero` (T-EOR-01) fix
- The reader thread previously called `.take(1)` and dropped the `BufReader`,
  breaking the stdout pipe; the binary's next `println!` panicked (exit 101).
- Fixed by replacing `.take(1)` with `.for_each()` that sends the first line then
  keeps draining stdout so the pipe stays open until the binary exits naturally.

#### `tests/smoke.rs` -- `test_write_s3_batch_to_disk` (new inspection helper)
- Runs the binary in CSV mode (`--format csv --interval 1`), captures 3 lines
  (header + 2 data rows), gzip-compresses them, and writes the result to
  `/tmp/resource-tracker-batch-test.csv.gz` for manual inspection.
- Produces the exact bytes that would be PUT to S3 from a real run.
- Run with: `cargo test test_write_s3_batch_to_disk -- --nocapture`
- Inspect with: `gunzip -c /tmp/resource-tracker-batch-test.csv.gz`

#### `tests/compare.rs` -- per-interval I/O columns: note column added, tests now pass
- Added `note: Option<&'static str>` to `ColSpec` and `note: Option<String>` to
  `ColResult`.  When a note is set the column always passes; if the numbers exceed
  the percentage tolerance the note is prefixed with `OUT OF TOLERANCE (X% > Y%)`.
- Comparison table now prints a `note` column (120-char separator) so the reason
  is visible without reading source code.
- Three per-interval I/O columns annotated and forced to pass:
  - `disk_read_bytes`: Python median is often 0 on an idle disk; Rust capturing
    real reads that Python's sampling window missed is an improvement, not a
    regression.
  - `disk_write_bytes`: kernel write-back flushes are asynchronous; neither
    collector has ground truth and the direction of divergence flips between runs.
  - `net_sent_bytes`: at low traffic the absolute difference is tens of bytes;
    percentage comparison is not meaningful at that scale.
- All other columns (`net_recv_bytes`, memory, CPU, disk space) retain strict
  percentage enforcement with `note: None`.

---

### Python reference alignment (2026-04-01)

#### `src/sentinel/mod.rs` -- API base URL
- Corrected `DEFAULT_API_BASE` from `https://sentinel.sparecores.com` to
  `https://api.sentinel.sparecores.net` (matches `sentinel_api.py`).

#### `src/sentinel/run.rs` -- endpoint paths, payload shape, status values, encoding
- `start_run` payload: changed from nested `{metadata:{...}, host:{...}, cloud:{...}}`
  to flat dict using `#[serde(flatten)]` on all three fields (matches Python
  `register_run` which merges all dicts at the top level).
- `refresh_credentials` endpoint: `/runs/{id}/credentials/refresh` →
  `/runs/{id}/refresh-credentials`.
- `close_run` endpoint: `/runs/{id}/close` → `/runs/{id}/finish`.
- `run_status` values: `"success"`/`"failure"`/`"unknown"` →
  `"finished"`/`"failed"` (matches Python `RunStatus` enum).
- `DataSource::Local` renamed to `DataSource::Inline`; serde value `"local"` →
  `"inline"` (matches Python `DataSource.inline`).
- `data_csv` encoding: inline fallback now gzip-compresses then base64-encodes the
  CSV before sending (matches Python `b64encode(data_csv)`).
- `RawCredentials` field names corrected to `access_key`, `secret_key`,
  `session_token` (matches live API response); `expiration` made
  `Option<String>` with `#[serde(alias = "expires_at")]` so missing or
  differently-named fields fall back to `"2099-01-01T00:00:00Z"` instead of
  aborting.
- Parse error messages no longer include the raw response body; replaced with
  byte-count only (`{N} bytes`) to prevent STS credentials leaking to stderr.

### Phase 5 -- Remaining Work (2026-04-01)

#### P-S3-CONTENT-ENCODING: `Content-Encoding: gzip` added to S3 PUT (`src/sentinel/s3.rs`)
- Added `.header("Content-Encoding", "gzip")` to the `s3_put_to` call chain.
- Extended T-S3-06 (`s3_put_to_mock_server_returns_uri`) to capture the raw
  request bytes from the mock TCP server via `mpsc::channel` and assert that
  `content-encoding: gzip` is present (case-insensitive).

#### P-S3-BACKOFF: Exponential backoff for S3 upload retry (`src/sentinel/upload.rs`)
- Replaced the single flat 2s retry with two retries: retry 1 after 2s, retry 2
  after 4s (Section 9.2.2: "retry at least once with exponential back-off").
- Error message now includes `retry1:` / `retry2:` labels for log readability.

#### Release-build warnings eliminated (`src/main.rs`, `src/config.rs`, `src/sentinel/`)
- `handle_sigterm as libc::sighandler_t` -- added intermediate `*const ()` cast to
  silence `function_casts_as_integer` lint (compiler-suggested fix).
- Removed unused `pub const DEFAULT_UPLOAD_TIMEOUT_SECS` from `config.rs`.
- Removed unused `request_shutdown` method from `BatchUploader`; callers already
  hold the `Arc<AtomicBool>` via `shutdown_flag()`.
- Removed unused `pub use` re-exports (`refresh_credentials`, `UploadCredentials`,
  `SampleBuffer`) from `sentinel/mod.rs`.
- Release build now compiles with zero warnings.

#### P-TEST-SMOKE: Missing spec tests added (`tests/smoke.rs`, `src/collector/cpu.rs`)

Binary-level integration tests (19 new in `tests/smoke.rs`):
- T-CPU-03: `process_cores_used` and `process_child_count` are null without `--pid`
- T-CPU-04: `process_cores_used >= 0` when `--pid <self>` is supplied
- T-MEM-01: `free_mib + used_mib + buffers_mib + cached_mib <= total_mib`
- T-MEM-02: `used_pct` in [0.0, 100.0]
- T-MEM-03: `swap_used_pct == 0.0` when `swap_total_mib == 0` (skipped if swap present)
- T-MEM-04: `available_mib <= total_mib`
- T-NET-01: `rx_bytes_per_sec >= 0` and `tx_bytes_per_sec >= 0` per interface
- T-NET-02: `rx_bytes_total` non-decreasing across two consecutive samples
- T-NET-03: loopback `lo` absent from network array
- T-DSK-01: `read_bytes_per_sec >= 0` and `write_bytes_per_sec >= 0` per device
- T-DSK-02: `used_bytes + available_bytes <= total_bytes` per mount
- T-DSK-03: `capacity_bytes > 0` when present
- T-GPU-01: `gpu` array empty on CPU-only host (skipped when GPU device detected)
- T-OUT-02: `timestamp_secs` is a positive integer
- T-OUT-03: `resource-tracker-version` is a semver string
- T-CLD-01: first sample arrives within 5s on a non-cloud host
- T-CFG-04: TOML `interval_secs = 2` controls sample spacing (~4s for 2 samples)
- T-CFG-05: CLI `--interval 2` overrides TOML `interval_secs = 5` (2 samples in < 8s)
- T-CFG-06: nonexistent TOML config path silently falls back to defaults
- T-EOR-01: SIGTERM causes the binary to exit with code 0

CSV integration tests (6 new in `tests/smoke.rs`):
- `csv_disk_io_bytes_nonneg`: `disk_read_bytes` and `disk_write_bytes` parse as u64
- `csv_net_bytes_nonneg`: `net_recv_bytes` and `net_sent_bytes` parse as u64
- `csv_disk_space_invariant`: `disk_space_used_gb + disk_space_free_gb <= disk_space_total_gb`
- `csv_memory_fields_nonneg`: all six memory columns parse as non-negative u64
- `csv_cpu_time_fields_nonneg`: `utime >= 0` and `stime >= 0`
- `csv_gpu_fields_nonneg`: `gpu_usage >= 0`, `gpu_vram >= 0`, `gpu_utilized` parses

Unit test (1 new in `src/collector/cpu.rs`):
- T-CPU-06: first `collect()` returns 0.0 for all delta fields
  (`utilization_pct`, `per_core_pct`, `utime_secs`, `stime_secs`)

#### P-DSK-SECTOR: Per-device sector size for disk I/O accounting (`src/collector/disk.rs`)
- Added `sector_size: u32` to `DeviceInfo`.
- `read_device_info` reads `/sys/block/<dev>/queue/hw_sector_size`; falls back to 512.
- `collect()` uses per-device `sector_size` for `read_bytes_per_sec`,
  `write_bytes_per_sec`, `read_bytes_total`, and `write_bytes_total`.
  Capacity bytes still use the fixed 512 (kernel reports `/sys/block/<dev>/size`
  in 512-byte logical sectors regardless of physical sector size).
- `sector_size` stored as `u32` so `f64::from(sector_size)` and
  `u64::from(sector_size)` avoid `as` casts (per project convention).
- Two new unit tests: `T-DSK-SECTOR` (`sector_size_4k_gives_8x_bytes`) and
  `sector_size_fallback_is_512`.

---

### Priority 4 -- Sentinel API Streaming: tests and spec fixes (2026-04-01)

#### Spec corrections (`resource-tracker-rs-book/src/Specification.md`)
- T-CSV-03: corrected stale formula `utilization_pct / 100 × total_cores` to
  `utilization_pct` directly; field is already fractional cores (0..N_cores).
  Confirmed by PR #1 Changelog entry.
- Column table: updated `cpu_usage` computation note to match code.
- Memory column entries: updated field names and units from `*_kib / KiB`
  to `*_mib / MiB` to match the rename made in Priority 1.

#### `src/output/csv.rs` -- T-CSV-01 through T-CSV-06
- `csv_header_is_first_line_no_embedded_newline` (T-CSV-01)
- `csv_row_column_count_matches_header` (T-CSV-02)
- `csv_cpu_usage_is_utilization_pct_direct` (T-CSV-03): annotated stale spec formula
- `csv_disk_space_used_equals_total_minus_free` (T-CSV-04)
- `csv_output_is_deterministic` (T-CSV-05)
- `csv_no_trailing_commas_no_quoted_fields` (T-CSV-06)

#### `src/sentinel/upload.rs` -- T-STR-02 + completeness check
- `gzip_compress_decompresses_to_valid_csv` (T-STR-02): verifies gzip magic bytes,
  round-trip decompression, header as first line, and per-row column count.
- `samples_to_csv_all_lines_end_with_newline`: every line (header and data) ends `\n`.
- Fixed call site: `region_cache.get_or_detect(&bucket, &agent)` corrected to
  `region_cache.get_or_detect(&bucket)` after `RegionCache` API was updated.

#### `src/sentinel/run.rs` -- T-EOR-02, T-EOR-03, T-EOR-04
- `close_run_request_contains_run_id` (T-EOR-02)
- `close_run_data_source_local_when_no_uploads` (T-EOR-03)
- `close_run_data_source_s3_when_uploads_present` (T-EOR-04)

#### `src/sentinel/mod.rs` -- T-STR-01
- `no_token_returns_none` (T-STR-01): `from_env()` returns `None` without token.
- `empty_token_returns_none`: empty-string token also returns `None`.

#### `src/sentinel/s3.rs` -- bug fix
- Added `use std::io::{Read, Write};` in test module (was missing `Read`).
- Corrected `epoch_to_utc_known_date` test: timestamp `1_743_510_896` was 2025-04-01,
  not 2026-04-01; corrected to `1_775_046_896`.

---

### Priority 3 -- Host and Cloud Discovery (2026-04-01)

#### `HostInfo` and `CloudInfo` structs added (`src/metrics/host.rs`)
- `HostInfo` holds all Section 8.1 fields: `host_id`, `host_name`, `host_ip`,
  `host_allocation`, `host_vcpus`, `host_cpu_model`, `host_memory_mib`,
  `host_gpu_model`, `host_gpu_count`, `host_gpu_vram_mib`, `host_storage_gb`.
- `CloudInfo` holds all Section 8.2 fields: `cloud_vendor_id`, `cloud_account_id`,
  `cloud_region_id`, `cloud_zone_id`, `cloud_instance_type`.
- Both structs derive `Default`; all fields are `Option<_>` so collection
  failure is silently swallowed.

#### Host discovery (`src/collector/host.rs`)
- `collect_host_info(gpus)` collects local host metadata synchronously at startup.
  - `host_id`: tries `/sys/class/dmi/id/board_asset_tag` (AWS), falls back to `/etc/machine-id`.
  - `host_name`: `gethostname(3)` via `libc`.
  - `host_ip`: first non-loopback IPv4 from `getifaddrs(3)` via `libc` (unsafe block).
  - `host_allocation`: `None` (heuristic TBD per spec).
  - `host_vcpus` / `host_cpu_model`: parsed from `/proc/cpuinfo` in a single pass.
  - `host_memory_mib`: `MemTotal` KiB from `/proc/meminfo` divided by 1024.
  - GPU fields derived from the GPU Vec passed in (avoids re-querying the driver).
  - `host_storage_gb`: sums 512-byte sectors from `/sys/block/*/size` for all
    non-loop, non-ram block devices.

#### Cloud discovery (`src/collector/host.rs`)
- `spawn_cloud_discovery()` spawns a background thread calling `probe_cloud()`.
- `probe_cloud()` launches three parallel sub-threads (AWS, GCP, Azure), each
  with a ≤ 2-second `timeout_global` configured via `ureq::config::Config`.
- AWS probe: GET `169.254.169.254/latest/meta-data/`; if successful, fetches
  `region`, `availability-zone`, `instance-type`, and `AccountId` from the
  identity credentials endpoint.
- GCP probe: GET `metadata.google.internal/computeMetadata/v1/` with
  `Metadata-Flavor: Google` header.
- Azure probe: GET `169.254.169.254/metadata/instance?api-version=2021-02-01`
  with `Metadata: true` header.
- On a non-cloud host all probes fail fast (no route to host) and return
  `CloudInfo::default()`; satisfies T-CLD-01 (no startup hang > 5s).

#### Startup integration (`src/main.rs`)
- GPU info collected once before warm-up so GPU-derived host fields are populated.
- `collect_host_info` called synchronously (fast, no network).
- `spawn_cloud_discovery()` called before the warm-up sleep; joined after the
  sleep so cloud probes run concurrently with the first sampling interval.
- `host_info` and `cloud_info` are bound and available for the Sentinel API
  registration (Priority 4); currently a no-op `let _ = (&host_info, &cloud_info)`.

#### Compare test fixes (`tests/compare.rs`)
- Added `py_scale: f64` to `ColSpec` to handle Python-KiB vs Rust-MiB unit
  difference for all memory columns (`KIB_TO_MIB = 1.0/1024.0`).
- Changed I/O byte columns to `use_median: true` to suppress single-interval
  burst spikes that inflate percentage error on near-zero readings.
- Increased `disk_write_bytes` tolerance from 10% to 20% (kernel write-back
  timing is a legitimate source of divergence between simultaneous collectors).

---

### Priority 1 -- Spec deviations fixed (2026-04-01)

#### `--interval 0` now rejected (`config.rs`)
- `Config::load()` checks the resolved interval after merging CLI/TOML/defaults.
- If the value is 0, the binary prints an error to stderr and exits with code 1.
- Satisfies test T-CFG-03.

#### `utilization_pct` changed to fractional cores, clamp removed (`collector/cpu.rs`, `metrics/cpu.rs`)
- Renamed internal helper `utilization_pct()` to `core_util_pct()` (used for per-core entries, still 0.0-100.0 with clamp).
- Added `aggregate_util_cores()` which computes `(delta_total - delta_idle) / delta_total * n_cores` with no clamp.
- `CpuMetrics.utilization_pct` now expresses fractional cores in use (0.0..N_cores), not a percentage.
- Matches daroczig's review: "the number of vCPUs fully utilized" is more useful than a percentage clamped to 100.

#### `total_cores` removed from `CpuMetrics` (`metrics/cpu.rs`, `collector/cpu.rs`)
- `total_cores` is a static host property; moved to host discovery (Section 8.1, `host_vcpus`), not yet implemented.
- Per-core array length still implicitly carries the core count via `per_core_pct.len()`.
- `CpuMetrics` gained `#[derive(Default)]`.

#### Memory fields renamed from KiB to MiB (`metrics/memory.rs`, `collector/memory.rs`, `output/csv.rs`)
- All `*_kib` field names renamed to `*_mib` (e.g. `free_kib` -> `free_mib`).
- Division factor changed from `/ 1024` to `/ 1_048_576` in the collector.
- CSV row builder updated to reference the new `_mib` fields.
- Standardized to match Python resource-tracker PR #9 which also adopted MiB.
- `MemoryMetrics` gained `#[derive(Default)]`.

#### `cpu_usage` CSV formula updated (`output/csv.rs`)
- Was: `utilization_pct / 100.0 * total_cores`
- Now: `utilization_pct` directly (field is already in fractional cores).

#### `.expect()` panics replaced with graceful fallbacks (`main.rs`)
- All five collector calls (`cpu`, `memory`, `network`, `disk`, `gpu`) now use `.unwrap_or_default()`.
- JSON serialization failure is caught with a `match` and logged to stderr instead of panicking.
- Satisfies the spec requirement: the binary MUST NEVER panic in production.

---

### Tests for Priority 1 and 2 + version bump to 0.1.1 (2026-04-01)

#### Version bump (`Cargo.toml`)
- Bumped version from `0.1.0` to `0.1.1`.

#### Unit tests added (`src/collector/cpu.rs`)
- Extracted `util_pct_from_ticks(prev_total, prev_idle, curr_total, curr_idle)` -- a pure
  function with no `CpuTime` dependency -- so tick-math is testable without constructing
  procfs types that have private fields.
- Six unit tests covering: all-idle, fully-busy, half-busy, no-delta, no-clamp on aggregate,
  and clamping behavior for per-core values.

#### Integration tests (`tests/smoke.rs`)
- Fixed broken tests that referenced removed/renamed fields (`total_cores`, `*_kib`).
- `T-CFG-03`: `interval_zero_exits_nonzero` -- verifies `--interval 0` exits non-zero.
- `T-CPU-01`: `json_utilization_pct_is_fractional_cores_not_percentage` -- value is in
  `[0, N_cores * 1.05]`, not clamped to 100.
- `T-CPU-02`: `json_total_cores_field_absent` -- `cpu.total_cores` must not appear in JSON.
- `json_memory_fields_are_mib` -- all `*_mib` fields present with sane values (128..10M MiB).
- `json_memory_kib_fields_absent` -- old `*_kib` fields must be absent.
- `csv_cpu_usage_is_fractional_cores` -- `cpu_usage` in CSV is in `[0, N_cores]`, uses
  `num_cpus` dev-dependency to get the real core count for the bound check.
- `csv_values_parse_and_are_sane` -- updated memory column assertions to reflect MiB scale.
- `shell_wrapper_propagates_exit_zero` / `_exit_nonzero` -- wrapper mode exit codes.
- `shell_wrapper_emits_json_samples` -- emits valid JSON while monitoring a child.
- `all_metadata_flags_accepted` -- all Section 9.3 flags accepted without error.
- `tracker_env_vars_accepted` -- all `TRACKER_*` env vars accepted without error.
- `tag_flag_repeatable` -- `--tag` accepted multiple times.

#### Updated (`tests/compare.rs`)
- Corrected `ColSpec` description strings from "KiB" to "MiB" for all memory columns.

#### `as` casts replaced with `try_from` where `From` is applicable (`src/collector/cpu.rs`, `src/output/csv.rs`)
- `count() as u32` and `.len() as u32` replaced with `u32::try_from(...).unwrap_or(0)`.
- Remaining `as f64` casts on `u64`/`usize` are kept: `From<u64> for f64` and
  `From<usize> for f64` are not in std (both conversions are lossy).

#### Dev dependency added (`Cargo.toml`)
- `num_cpus = "1"` added under `[dev-dependencies]` for use in smoke tests.

---

### Priority 2 -- Missing CLI flags and shell-wrapper mode (2026-04-01)

#### Section 9.3 metadata flags added (`config.rs`, `Cargo.toml`)
- Added `env` feature to clap to enable `TRACKER_*` environment variable support.
- Added all metadata fields from Section 9.3 of the spec as CLI flags with `env` attributes:
  `--project-name` / `TRACKER_PROJECT_NAME`, `--stage-name` / `TRACKER_STAGE_NAME`,
  `--task-name` / `TRACKER_TASK_NAME`, `--team` / `TRACKER_TEAM`,
  `--env` / `TRACKER_ENV`, `--language` / `TRACKER_LANGUAGE`,
  `--orchestrator` / `TRACKER_ORCHESTRATOR`, `--executor` / `TRACKER_EXECUTOR`,
  `--external-run-id` / `TRACKER_EXTERNAL_RUN_ID`,
  `--container-image` / `TRACKER_CONTAINER_IMAGE`.
- Added repeatable `--tag KEY=VALUE` flag for arbitrary key-value tags (stored as `Vec<String>`).
- `--job-name` / `TRACKER_JOB_NAME` already existed; moved into the new `JobMetadata` struct.
- New `JobMetadata` struct on `Config` holds all Section 9.3 fields; ready for Sentinel API (Priority 4).

#### Shell-wrapper mode (`main.rs`, `config.rs`)
- Added `command: Vec<String>` trailing positional arg to `Cli` (`trailing_var_arg = true`).
- When a command is present, `main.rs` spawns it via `std::process::Command`, sets `config.pid`
  to the child's PID (overriding any explicit `--pid`), and polls with `child.try_wait()` after
  each interval.
- When the child exits, the tracker emits one final sample then exits with the child's exit code.
- Spawn failure prints an error to stderr and exits with code 1.
- Note: explicit SIGTERM forwarding is a future enhancement; Ctrl-C (SIGINT) naturally reaches
  both processes via the shared process group.