wavepeek 0.4.0

Command-line tool for RTL waveform inspection with deterministic machine-friendly output.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
# Design Document

## 1. Overview

### 1.1 Vision
wavepeek is a command-line tool for RTL waveform inspection. It provides deterministic, machine-friendly output and a minimal set of primitives that compose into repeatable debug recipes.

### 1.2 Problem Statement

**The waveform access gap for LLM agents**

In RTL design and verification, waveforms are the primary debugging artifact. Unlike software where logs, stack traces, and debugger output are mostly textual and sequential, RTL debugging requires analyzing dense temporal data across hierarchical module structures. Waveforms are a high-bandwidth information channel — humans process them visually, leveraging the eye's capacity for pattern recognition.

**Why waveforms matter:**
- Bug reports come with attached dumps (reproducing from scratch can take days)
- Engineers ask questions about waveforms and want to delegate analysis
- Waveforms are the de-facto artifact for RTL debugging — without them, little can be done

**The gap:** There is no tool to "grep" waveforms — to extract signal information, values, and temporal relationships via short, composable commands. GUI viewers (GTKWave) work for humans but cannot be automated. This leaves LLM agents blind to simulation results.

VCD is text and therefore natively readable by LLM agents, but real-world dumps are typically very large. Directly exploring raw VCD content is difficult and inefficient: it quickly consumes context window budget and makes iterative analysis expensive. Agents need a structured, deterministic, LLM-friendly derivative view of waveform data rather than raw dump text.

### 1.3 Target Users

**Primary: LLM agents**
- Agentic systems for RTL debugging and verification
- Need structured, deterministic output for reasoning
- Require composable primitives for tool chaining

**Secondary: CI/CD and automation**
- Automated regression pipelines
- Post-simulation analysis scripts
- Waveform-based assertions in CI

**Non-target: Interactive human debugging**
- Humans are not the primary audience
- The tool can be used for scripting and quick queries
- Interactive waveform exploration will still be done in GUI viewers

### 1.4 Design Principles

1. **LLM-first** — Output formats, command structure, and error messages are designed for machine consumption
2. **Self-documenting I/O** — Commands read as unambiguous descriptions of what they do. Human-readable output is the default UX, while strict machine output is explicit via `--json`.
3. **Composable commands** — Unix philosophy: do one thing well, combine via pipes. Command names are unambiguous first, short second
4. **Deterministic output** — Same input always produces same output (no timestamps, random IDs, etc.)
5. **Stable formats** — JSON output uses an explicit versioned schema URL (`$schema`) when `--json` is requested. Human-readable output remains intentionally flexible and is the default for waveform commands.
6. **Minimal footprint** — Fast startup, low memory, no background processes

---

## 2. Scope

### 2.1 In Scope
- VCD/FST dump file support
- Signal discovery: list, search, hierarchy navigation
- Value extraction over time ranges
- Property checks over event triggers
- Stateless CLI (no sessions, no caching, no background processes)

### 2.2 Out of Scope
- GUI/TUI waveform viewer
- Real-time waveform streaming
- Live simulator connections
- Waveform diffing/comparison

### 2.3 Future Considerations
- MCP server for LLM agent integration
- Other formats (FSDB, VPD, WLF)

---

## 3. Functional Requirements

### 3.1 Supported File Formats

- VCD (Value Change Dump)
- FST (Fast Signal Trace)

### 3.2 CLI Commands

#### General conventions

- **No positional arguments.** All arguments are named flags for self-documenting commands.
  All commands that operate on a waveform dump require `--waves <file>`. Commands that do not
  read a dump (e.g., `schema`) do not accept `--waves`.
- **Standalone help contract.** Help text is a first-class CLI contract. `wavepeek` (no args), `wavepeek -h`, and `wavepeek --help` are byte-identical top-level entry points, and each shipped subcommand makes `-h` byte-identical to `--help` while documenting command semantics, defaults/requiredness, boundary rules, error-category guidance, and output shape.
- **Bounded output.** All commands have bounded output by default (to avoid flooding LLM context).
  Boundedness is achieved via one or more of: `--max`, `--first`/`--last`, input size (e.g., number
  of `--signals`), or inherently finite output (e.g., `schema`). When list output is truncated due
  to `--max`, a warning is emitted.
- **Bounded recursion.** Recursive commands have `--max-depth` with a default of 5.
- **Output format.** Default output is human-readable for waveform commands.
  Strict machine output is enabled explicitly with `--json`. The `schema` command is a special case and always emits one JSON Schema document to stdout.
- **JSON envelope (`--json` mode).** On success, JSON output is a single object:

  ```json
  {
    "$schema": "https://raw.githubusercontent.com/kleverhq/wavepeek/v<version>/schema/wavepeek.json",
    "command": "<command>",
    "data": {},
    "warnings": []
  }
  ```

  Notes:
  - `$schema` is always serialized literally as `$schema` and points to the canonical schema artifact for the current tool version.
  - `command` is the subcommand name and can be used to discriminate the shape of `data`.
  - `data` is an object for scalar outputs (e.g., `info`) and an array for list-like outputs.
  - `warnings` is an array of free-form strings. In human mode, warnings are printed to stderr.
  - On error, stdout is empty; stderr contains `error: <category>: <message>` (see §5.6).
- **Time format.** All time values require explicit units: `zs`, `as`, `fs`, `ps`, `ns`, `us`, `ms`, `s`.
  The numeric part must be an integer (e.g., `2000ps`, `15ns`).
  Bare numbers without units are rejected.
- **Time normalization.** All parsed times are converted to the dump's time unit.
  Output timestamps are printed as normalized integer counts in dump `time_unit` units (e.g., `2000ps`).
  If a provided time cannot be represented exactly in dump precision, it is an error.
- **Time ranges.** Commands that operate on time ranges use `--from` and `--to`.
  Both are optional: `--from` + `--to` defines a window, only `--from` means
  from that point to end of dump, only `--to` means from start to that point,
  neither means the entire dump.
  Range boundaries are inclusive.

#### 3.2.0 `schema` — Canonical JSON schema export

Outputs the canonical JSON schema document for wavepeek machine output contracts.

```
wavepeek schema
```

**Behavior:**
- Accepts no command-specific flags or positional arguments.
- Writes exactly one JSON Schema document to stdout (no envelope wrapping).
- Output is deterministic and byte-stable.
- Output bytes match the canonical repository artifact at `schema/wavepeek.json`.

#### 3.2.1 `info` — Dump metadata

Outputs basic metadata about the waveform dump.

```
wavepeek info --waves <file> [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--json` | off | Strict JSON envelope output |

**Behavior:**
- Default output: human-readable metadata summary.
- Summary includes: time unit, start and end time of the dump
- `--json` prints strict JSON envelope output.

**Examples:**
```bash
# Default human-readable output
wavepeek info --waves dump.vcd

# Strict JSON envelope
wavepeek info --waves dump.vcd --json
```

#### 3.2.2 `scope` — Hierarchy exploration

Outputs a flat list of hierarchy scopes, recursively traversing the hierarchy.

```
wavepeek scope --waves <file> [--max <n|unlimited>] [--max-depth <n|unlimited>] [--filter <regex>] [--tree] [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--max <n\|unlimited>` | 50 | Maximum number of entries in output; `unlimited` disables truncation |
| `--max-depth <n\|unlimited>` | 5 | Maximum traversal depth; `unlimited` disables depth truncation |
| `--filter <regex>` | `.*` | Filter by full path (regex) |
| `--tree` | off | Render hierarchy as an indented tree in human mode |
| `--json` | off | Strict JSON envelope output |

**Behavior:**
- Outputs flat list of scope paths with metadata
- Includes all scope kinds available in hierarchy data (not only modules)
- Ordering: pre-order depth-first traversal; children at each scope are visited in lexicographic order
- Default output: human-readable list mode.
- `--tree` switches human output to visual hierarchy rendering.
- `--json` returns strict JSON envelope with a flat `data` array.
- Invalid regex is an `args` error.
- `--max 0` is an `args` error.
- `--max unlimited` disables count truncation and emits warning `limit disabled: --max=unlimited`.
- `--max-depth unlimited` disables depth truncation and emits warning `limit disabled: --max-depth=unlimited`.
- If both limits are `unlimited`, warnings are emitted in deterministic order: `--max` then `--max-depth`.
- Each item has:
  - `path`: full scope path (string)
  - `depth`: integer depth (root = 0)
  - `kind`: parser-native scope kind alias (string)
- If results exceed `--max`, output is truncated and a warning is emitted.
- Legacy `tree` command name is not supported.

**Examples:**
```bash
# List all scopes (default human output)
wavepeek scope --waves dump.vcd

# Find ALU-related scopes
wavepeek scope --waves dump.vcd --filter ".*alu.*"

# Explore 2 levels deep with visual tree rendering
wavepeek scope --waves dump.vcd --max-depth 2 --tree

# Get all scopes
wavepeek scope --waves dump.vcd --max 1000

# Disable both count and depth limits explicitly
wavepeek scope --waves dump.vcd --max unlimited --max-depth unlimited
```

#### 3.2.3 `signal` — Signal listing

Lists signals within a specific scope with their metadata.

```
wavepeek signal --waves <file> --scope <path> [--recursive] [--max-depth <n|unlimited>] [--max <n|unlimited>] [--filter <regex>] [--abs] [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--scope <path>` | required | Exact scope path (e.g., `top.cpu`) |
| `--recursive` | off | Include nested child scopes under `--scope` |
| `--max-depth <n\|unlimited>` | 5 (when recursive) | Maximum recursion depth below `--scope` (requires `--recursive`); `unlimited` disables depth truncation |
| `--max <n\|unlimited>` | 50 | Maximum number of entries in output; `unlimited` disables truncation |
| `--filter <regex>` | `.*` | Filter by signal name (regex) |
| `--abs` | off | Show full signal paths in human mode |
| `--json` | off | Strict JSON envelope output |

**Behavior:**
- Without `--recursive`, lists only signals directly in the specified scope.
- With `--recursive`, traverses child scopes depth-first in deterministic lexicographic scope order.
- `--max-depth` is valid only with `--recursive`; depth `0` means only the selected scope.
- Human mode defaults to short signal names in non-recursive mode.
- Human mode in recursive mode prints paths relative to `--scope` (for example, `cpu.valid`); `--abs` always shows canonical full paths.
- Output fields (JSON): `name`, `path`, `kind`, `width` (`path` stays full/canonical in JSON mode)
- Signal ordering is deterministic: per visited scope sort by `(name, path)`.
- Default output: human-readable listing.
- `--json` returns strict JSON envelope with `data` as an array of objects.
- Invalid regex is an `args` error.
- `--max-depth` without `--recursive` is an `args` error.
- `--max 0` is an `args` error.
- `--max unlimited` disables count truncation and emits warning `limit disabled: --max=unlimited`.
- `--max-depth unlimited` (with `--recursive`) disables depth truncation and emits warning `limit disabled: --max-depth=unlimited`.
- If unlimited-limit warnings and legacy truncation warnings both apply, unlimited warnings are emitted first.
- If results exceed `--max`, output is truncated and a warning is emitted.
- Legacy `signals` command name is not supported.

**Examples:**
```bash
# List signals in top.cpu (default human, short names)
wavepeek signal --waves dump.vcd --scope top.cpu

# Find clock signals
wavepeek signal --waves dump.vcd --scope top.cpu --filter ".*clk.*"

# Human output with full paths
wavepeek signal --waves dump.vcd --scope top.cpu --abs

# Recursively list nested signals under top (relative display paths in human mode)
wavepeek signal --waves dump.vcd --scope top --recursive --max-depth 2

# Disable recursion-depth limit while keeping count bound
wavepeek signal --waves dump.vcd --scope top --recursive --max 100 --max-depth unlimited
```

#### 3.2.4 `value` — Value extraction at time point

Gets signal values at a specific time point.

```
wavepeek value --waves <file> --at <time> [--scope <path>] --signals <names> [--abs] [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--at <time>` | required | Time point with units (e.g., `1337ns`, `10us`) |
| `--scope <path>` || Scope for short signal names |
| `--signals <names>` | required | Comma-separated signal names |
| `--abs` | off | Show canonical paths in human mode |
| `--json` | off | Strict JSON envelope output |

**Modes:**
- `--signals clk,data` (no scope) — names are full paths
- `--scope top.cpu --signals clk,data` — names are short and resolved relative to scope

**Behavior:**
- Outputs signal values at specified time
- Default output: human-readable value summary.
- Human mode shows compact lines: `@<time>` and `<display> <value>` per signal.
- Human mode uses exact `--signals` tokens as display names by default; `--abs` switches display names to canonical paths.
- `--json` outputs JSON envelope with `data` as an object:
  - `time`: normalized time string in dump `time_unit`
  - `signals`: array of `{ path, value }` in the same order as `--signals` (`path` is always canonical absolute path)
- Values are emitted as Verilog literals: `<width>'h<digits>` (including `x`/`z`).
- Fail fast: error if any signal not found

**Examples:**
```bash
# Get values by full paths
wavepeek value --waves dump.vcd --at 100ns --signals top.cpu.clk,top.cpu.data

# Get values relative to scope (shorter)
wavepeek value --waves dump.vcd --at 100ns --scope top.cpu --signals clk,data,valid

# Strict JSON envelope
wavepeek value --waves dump.vcd --at 100ns --scope top.cpu --signals clk --json
```

#### 3.2.5 `change` — Value changes over time range

Outputs delta snapshots of signal values over a time range using one event-trigger model.

```
wavepeek change --waves <file> [--from <time>] [--to <time>] [--scope <path>] --signals <names> [--on <event_expr>] [--max <n|unlimited>] [--abs] [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--from <time>` | start of dump | Start time with units (e.g., `1us`) |
| `--to <time>` | end of dump | End time with units (e.g., `2us`) |
| `--scope <path>` || Scope for short signal/trigger names |
| `--signals <names>` | required | Comma-separated signal names |
| `--on <event_expr>` | `*` | Event expression (`*`, named, edge, union) |
| `--max <n\|unlimited>` | 50 | Maximum number of snapshot rows; `unlimited` disables truncation |
| `--abs` | off | Canonical paths in human output |
| `--json` | off | Strict JSON envelope output |

**Supported `--on` forms:**
- Non-edge: `*` (any change in resolved `--signals`) and `<name>` (any change of that signal)
- Edge: `posedge <name>`, `negedge <name>`, `edge <name>`
- Union: `event or event`, `event, event` (exact synonyms)
- Optional `iff` clause: `event iff logical_expr`

Formal event-expression grammar, precedence, and semantics are defined in
`docs/expression_lang.md` (Event Expressions).

**Behavior:**
- Name resolution follows `value`: without `--scope`, tokens are canonical paths; with `--scope`, tokens are short names relative to that scope.
- Scoped mode rejects canonical full-path tokens for both `--signals` and `--on` names.
- Time tokens require explicit units and exact alignment to dump precision.
- `--from` defines a baseline checkpoint: values are sampled/initialized at `--from` (or dump start when omitted), and no row is emitted at that exact timestamp.
- Candidate timestamps come from `--on` inside inclusive `[--from, --to]`; rows can only be emitted for timestamps strictly greater than the baseline checkpoint.
- Candidate evaluation is internally reduced to event-relevant timestamps (tracked/event signal change points) instead of scanning every dump timestamp, while preserving observable behavior.
- Each emitted row is delta-only: emit only when sampled `--signals` changed relative to the last known sampled state strictly before that timestamp.
- Delta comparison always uses sampled state strictly before the candidate timestamp in the underlying dump order (not merely before the previous emitted candidate).
- If an event fires but sampled `--signals` did not change, no row is emitted.
- Per-signal baseline initialization is lazy: missing prior sampled state is initialized at the timestamp and excluded from delta decision for that timestamp.
- Edge detection uses previous value strictly before timestamp and SystemVerilog-style classification on LSB only, with nine-state values (`h/u/w/l/-`) normalized to `x`.
- For FST dumps, implementation may route candidate-time discovery through `wellen::stream` filter pushdown (`start`, `end`, `signals`) for heavy windows while preserving stateless single-process execution and fallback behavior.
- Default output: human-readable snapshot list.
- Human rows are one line per snapshot: `@<time> <display_1>=<value_1> <display_2>=<value_2> ...`.
  `--abs` switches `<display_i>` from requested token to canonical path.
- JSON rows are `{ "time": <normalized>, "signals": [{"path": ..., "value": ...}, ...] }`.
- Values are Verilog literals: `<width>'h<digits>` with lowercase hex and `x`/`z` support.
- If no rows are emitted, `data` is empty and warning is exactly `no signal changes found in selected time range`.
- `--max 0` is an `args` error.
- `--max unlimited` disables row truncation and emits warning `limit disabled: --max=unlimited`.
- If `--max unlimited` is used together with any existing warning (for example empty-result warning), the unlimited warning is emitted first.
- If results exceed `--max`, output is truncated and a warning is emitted.
- Legacy `changes` command name is not supported.

**Examples:**
```bash
# Default trigger (`*`): any sampled signal change
wavepeek change --waves dump.vcd --from 1us --to 2us --signals top.cpu.clk,top.cpu.data

# Named non-edge trigger with scope-relative names
wavepeek change --waves dump.vcd --from 1us --to 2us --scope top.cpu --signals data,valid --on "data"

# Union trigger with comma/or synonyms
wavepeek change --waves dump.vcd --from 1us --to 2us --scope top.cpu --signals clk1 --on "posedge clk1, posedge clk2"

# Strict JSON envelope
wavepeek change --waves dump.vcd --from 1us --to 2us --scope top.cpu --signals clk --on "edge clk" --json

# Disable row truncation explicitly
wavepeek change --waves dump.vcd --signals top.sig --max unlimited
```

#### 3.2.6 `property` — Property checks over event triggers

Checks a logical property over event-selected timestamps.

```
wavepeek property --waves <file> [--from <time>] [--to <time>] [--scope <path>] [--on <event_expr>] --eval <logical_expr> [--capture <match|switch|assert|deassert>] [--json]
```

**Parameters:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `--waves <file>` | required | Path to VCD/FST file |
| `--from <time>` | start of dump | Start of time range |
| `--to <time>` | end of dump | End of time range |
| `--scope <path>` || Scope for short signal names in `--on`/`--eval` |
| `--on <event_expr>` | `*` | Event expression (`*`, named, edge, union) |
| `--eval <logical_expr>` | required | Logical expression to evaluate at candidate timestamps |
| `--capture <mode>` | `switch` | Capture mode: `match`, `switch`, `assert`, `deassert` |
| `--json` | off | Strict JSON envelope output |

**Behavior:**
- Name/scope resolution follows `value` and `change`: without `--scope`, tokens are canonical paths; with `--scope`, tokens are short names relative to that scope.
- Time boundaries are inclusive (`--from`, `--to`), and time tokens require explicit units aligned to dump precision.
- Candidate timestamps come from `--on`.
- Default `--on` is `*` and is interpreted as any change among signals referenced by `--eval`, including raw-event handles referenced through `.triggered()`, to avoid per-time-unit output spam. If `--eval` references no signals or raw events, users must pass `--on` explicitly.
- Supported `--on` forms match `change`: `*`, `<name>`, `posedge <name>`, `negedge <name>`, `edge <name>`, and union forms via `or`/`,`.
- `--eval` is evaluated on each candidate timestamp with 4-state semantics. Final capture decisions are command-level: integral results count as true when any bit is `1` and false otherwise, real results count as true when non-zero and false otherwise, and string results count as false.
- `match`: emit every event timestamp where `--eval` is true.
- `switch`: emit only transitions (`assert` on `0->1`, `deassert` on `1->0`) after initializing baseline state from the inclusive range start.
- `assert`: emit only `0->1` transitions.
- `deassert`: emit only `1->0` transitions.
- Human output target is compact and action-oriented: `@123ns assert`, `@1234ns deassert`, or `@1223ps match`.
- JSON output uses deterministic ordering and a strict envelope contract under `--json`.

**Expression language contract:**
- `--on` and `--eval` syntax/semantics are defined in `docs/expression_lang.md`.
- This document keeps command-level behavior only; operator sets, precedence,
  casts, and detailed type semantics are centralized in the expression language
  contract.

**Examples:**

```bash
# Basic property check on clock edges
wavepeek property --waves dump.vcd --scope top.cpu --on "posedge clk" --eval "data == 0xff"

# Emit only assertion edges (0->1)
wavepeek property --waves dump.vcd --scope top.cpu --on "edge clk" --eval "valid && ready" --capture assert

# Switch mode (default) can be passed explicitly
wavepeek property --waves dump.vcd --from 1us --to 2us --eval "top.cpu.data == 0xff" --capture switch
```

---

## 4. Non-Functional Requirements

### 4.1 Performance
- Performance is the highest priority
- Rust is chosen specifically for this reason
- Benchmarks are maintained through `bench/e2e/perf.py` for CLI scenarios and `bench/expr/perf.py` for expression microbenchmarks, with room for future expansion as the project matures

### 4.2 Compatibility
- OS agnostic: Linux, macOS, Windows

### 4.3 Output Stability
- Deterministic output for identical input data

### 4.4 LLM Agent Integration
- Ready-made skill definition for LLM CLI agents (OpenCode, Codex CLI, Claude Code)
- Skill shipped in repo with setup instructions
- Deterministic `--json` output with stable `$schema` URL contracts for machine consumers

---

## 5. Technical Architecture

### 5.1 Technology Stack

| Component | Choice | Rationale |
|-----------|--------|-----------|
| **Language** | Rust stable (MSRV 1.93) | Performance, memory safety, zero-cost abstractions. Ideal for parsing large binary/text dump files without GC pauses. |
| **CLI framework** | `clap` (derive API) | De-facto standard for Rust CLIs. Derive API provides self-documenting argument definitions with compile-time validation. |
| **Waveform parsing** | `wellen` | Unified interface for VCD and FST formats. Battle-tested in the [Surfer]https://surfer-project.org/ waveform viewer. Multi-threaded VCD parsing via `rayon`. Optimized for on-demand signal access (loads hierarchy first, signal data lazily). |
| **Serialization** | `serde` + `serde_json` | Standard Rust serialization. Used for strict `--json` envelope output and JSON schema export. |
| **Pattern matching** | `regex` | For `--filter` flag in `scope` and `signal` commands. |
| **Error handling** | `thiserror` | Typed error enums with `#[derive(Error)]`. All error variants are known at compile time. No runtime error boxing (no `anyhow`). |
| **Build automation** | Cargo + Make | Cargo for compilation. Makefile provides shorthand targets: `make format`, `make format-check`, `make lint`, `make test`, `make check`. |

### 5.2 High-Level Architecture

The system is organized in three execution layers plus a shared output module. Data
flows top-down for execution: CLI parses arguments, delegates to the engine, which
reads waveform data through the waveform layer and returns structured results. The
shared output module renders those results for stdout.

**Execution layers (top to bottom):**

1. **CLI Layer** (`clap`) — Argument parsing, validation, and command dispatch.
   Defines help text and typed command structs, then passes those structs to the
   engine.

2. **Engine Layer** — Business logic per command: `info`, `scope`, `signal`,
   `value`, `change`, `property`, `schema`. Operates on waveform abstractions, returns structured
   results. It is clap-free in behavior, but currently receives CLI-owned argument structs from
   `src/cli/`. Contains shared time validation/normalization utilities, shared value-formatting
   utilities, shared expression-runtime helpers, and the `change` multi-engine
   dispatcher described in [5.7 Change Command Execution Architecture]#57-change-command-execution-architecture.

3. **Waveform Layer** (`wellen`) — Thin adapter over wellen. Handles file opening,
   format detection (VCD/FST), hierarchy traversal, and signal value queries.
   Exposes adapter methods plus normalized metadata, scope, signal, and sampled-value structs.

**Shared output module:**

- `output.rs` — Human-readable default output and strict JSON rendering for engine
  results. Owns stdout serialization; errors still go to stderr.

**Key architectural decisions:**

- **Stable output schema.** JSON output in `--json` mode uses a versioned `$schema` URL.
  The canonical schema artifact is tracked at `schema/wavepeek.json`; implementation may
  introduce internal data structures to stabilize the output contract independently of
  upstream dependency APIs.

- **Stateless execution.** Each CLI invocation opens the file, executes one command,
  and exits. No caching, no sessions, no background processes. This simplifies the
  architecture and matches the design principle of minimal footprint.

- **Format-agnostic engine.** The engine layer does not know whether it is working
  with VCD or FST. Format detection and loading is handled entirely by
  the waveform layer (wellen auto-detects format from file content).

### 5.3 Module Structure

Single crate with internal module boundaries and a thin library entrypoint used
by the CLI binary and integration-style expression tests/benchmarks.

```
src/
├── lib.rs               # Crate entrypoint (`run_cli`) + module ownership
├── main.rs              # Thin binary wrapper around `wavepeek::run_cli()`
├── cli/                 # CLI layer: argument definitions, help text, dispatch
│   ├── mod.rs           # Top-level CLI struct, parse-error normalization, output handoff
│   ├── limits.rs        # Shared bounded-output flag parsing (`--max`, `--max-depth`)
│   ├── info.rs          # `info` command args + clap help
│   ├── scope.rs         # `scope` command args + clap help
│   ├── signal.rs        # `signal` command args + clap help
│   ├── value.rs         # `value` command args + clap help
│   ├── change.rs        # `change` command args + clap help
│   ├── property.rs      # `property` command args + clap help
│   └── schema.rs        # `schema` command args + clap help
├── engine/              # Business logic per command
│   ├── mod.rs           # Command dispatch + shared result types
│   ├── info.rs          # Dump metadata extraction
│   ├── scope.rs         # Hierarchy traversal with depth/filter
│   ├── signal.rs        # Signal listing within scope
│   ├── value.rs         # Value extraction at time point
│   ├── change.rs        # Value change tracking (`--on` event triggers, see 5.7)
│   ├── expr_runtime.rs  # Shared typed-expression binding/evaluation helpers for command runtimes
│   ├── time.rs          # Shared time token parsing/validation/alignment helpers
│   ├── value_format.rs  # Shared Verilog literal formatting helpers
│   ├── property.rs      # Property runtime entrypoint and capture-mode execution
│   └── schema.rs        # JSON schema export
├── schema_contract.rs   # Canonical schema URL and embedded schema artifact
├── expr/                # Expression engine foundation (shared by `change`/`property`)
│   ├── mod.rs           # Public typed facade for parsing/binding/evaluation
│   ├── ast.rs           # Spanned expression AST types
│   ├── diagnostic.rs    # Parse/semantic/runtime diagnostic contract
│   ├── lexer.rs         # Spanned tokenizer for event-expression parsing
│   ├── parser.rs        # Strict typed parser (`parse_event_expr_ast`, `parse_logical_expr_ast`)
│   ├── host.rs          # Host trait + signal/type/value bridge types
│   ├── sema.rs          # Typed event/logical binder (`bind_event_expr_ast`, `bind_logical_expr_ast`)
│   └── eval.rs          # Typed event matcher + standalone logical evaluator (`event_matches_at`, `eval_logical_expr_at`)
├── waveform/            # Thin adapter over wellen
│   ├── mod.rs           # File loading, format detection, query helpers
│   └── expr_host.rs     # Crate-private waveform-to-expression adapter for standalone expression tests/benchmarks
├── output.rs            # Shared output formatting (JSON + human)
└── error.rs             # Error enum (WavepeekError)
```

**Separation of concerns:**

| Module | Knows about | Does not know about |
|--------|-------------|---------------------|
| `cli/` | clap, command dispatch, help text | wellen, file I/O, result rendering details |
| `engine/` | domain logic, waveform layer API, CLI-owned command arg structs | clap parsing flow, output formats |
| `expr/` | expression AST, signal values | wellen, CLI, output |
| `waveform/` | wellen API | CLI, engine logic |
| `output` | serde_json, JSON/human formatting | clap, waveform access, domain execution |
| `error` | all error variants | nothing else |

### 5.4 Key Dependencies

| Crate | Version | Purpose | Notes |
|-------|---------|---------|-------|
| `wellen` | ~0.20 | VCD, FST parsing | Core dependency. BSD-3-Clause. Brings in `rayon`, `fst-reader`, `memmap2`, `lz4_flex` transitively. |
| `clap` | ~4 | CLI argument parsing | With `derive` feature for declarative argument definitions. |
| `serde` | ~1 | Serialization framework | With `derive` feature. |
| `serde_json` | ~1 | JSON output | Default output serialization and schema export. |
| `regex` | ~1 | Pattern matching | For `--filter` in `scope` and `signal`. |
| `thiserror` | ~2 | Error type derivation | `#[derive(Error)]` for typed error enums. |

**Dev dependencies:**

| Crate | Purpose |
|-------|---------|
| `assert_cmd` | CLI integration testing (run binary, assert stdout/stderr/exit code) |
| `predicates` | Assertion helpers for `assert_cmd` |
| `tempfile` | Temporary file creation for tests |
| `insta` | Snapshot assertions for deterministic diagnostics |
| `criterion` | Expression microbenchmarks (`cargo bench --bench expr_syntax`, `cargo bench --bench expr_logical`, `cargo bench --bench expr_event`, `cargo bench --bench expr_waveform_host`) captured through `bench/expr/perf.py` |

### 5.5 Expression Engine

The `property` command evaluates typed logical expressions against signal values
at event-selected timestamps and then reduces the result to command-level
capture truth.

Language syntax and semantics are specified in `docs/expression_lang.md`; this
section describes implementation architecture only.

**Pipeline:** Input string → Lexer → Token stream → Parser → AST → Typed evaluator (+ signal values at current time point)
→ typed result (`integral`/`real`/`string`) → command-level truth reduction (`integral`: any `1`, `real`: non-zero, `string`: false)

**Components:**

- **Lexer/Parser** — Parse event and logical expression inputs into shared
  expression structures used by command runtimes.
- **AST/types** — Provide stable internal representation in `src/expr/` for
  reuse by `change` (`--on`) and `property` (`--eval`).
- **Evaluator** — Resolves signal names against sampled waveform values and
  computes expression outcomes for runtime decision points.

**Implementation status:**

- Typed standalone event and logical runtime with rich type support is implemented in
  `src/expr/sema.rs`, `src/expr/eval.rs`, and `src/expr/host.rs` through
  public `wavepeek::expr::parse_logical_expr_ast(...)`,
  `wavepeek::expr::bind_logical_expr_ast(...)`,
  `wavepeek::expr::eval_logical_expr_at(...)`, plus event
  `bind_event_expr_ast(...)` and `event_matches_at(...)` with `iff` payload
  reuse. That standalone surface now includes `real`, `string`, operand-type
  casts, enum-label references, and raw-event `.triggered()`.
- Dump-backed rich metadata is bridged through the crate-private
  `src/waveform/expr_host.rs` adapter plus crate-private lookup/decode helpers
  in `src/waveform/mod.rs`, so standalone tests and benchmarks can bind/evaluate
  against VCD/FST metadata without widening the public `wavepeek::expr` facade.
- The production `change` and `property` runtimes now bind expressions through
  the shared typed parser/binder/evaluator path and use a shared command-owned
  waveform handle bridged through `src/engine/expr_runtime.rs` plus
  `src/waveform/expr_host.rs`.
- The transitional compatibility parser has been retired; `src/expr/` now
  exposes only the typed parser, binder, and evaluator surface used by
  standalone tests, benchmarks, and production commands.

### 5.6 Error Handling Strategy

**Principles:**
- **Fail fast.** On the first error, print a message to stderr and exit with a
  non-zero code. Stdout is empty on errors (no JSON envelope).
- **LLM-parseable errors.** Error messages follow a fixed format so agents can
  detect and interpret them programmatically.
- **No panics in production paths.** All recoverable errors use `Result<T, WavepeekError>`.
  `unwrap()` / `expect()` only in cases that indicate programmer bugs.

**Error format (stderr):**

```
error: <category>: <message>
```

Examples:
```
error: file: cannot open 'dump.vcd': No such file or directory
error: args: --at requires units (e.g., '100ns'), got '100'
error: signal: signal 'top.cpu.foo' not found in dump
error: expr: parse error in condition: unexpected token ')' at position 12
```

**Exit codes:**

| Code | Meaning |
|------|---------|
| `0` | Success |
| `1` | User error (bad arguments, signal not found, invalid expression) |
| `2` | File error (cannot open, cannot parse, unsupported format) |

**Warnings:**

- Warnings (e.g., output truncation due to `--max`, no matches in a query) do not change exit code (still `0`).
- In `--json` mode, warnings are appended to the `warnings` array in the JSON envelope.
- In human mode, warnings are printed to stderr as free-form text.

**Error enum:**

A single `WavepeekError` enum in `src/error.rs` with variants covering all failure
modes. Each variant maps to an exit code and a formatted message.
The CLI layer converts `WavepeekError` into stderr output and exit code.

### 5.7 Change Command Execution Architecture

The `change` command uses a multi-engine execution model that keeps one user
contract while allowing different internal execution strategies for different
workloads.

Execution engines:

- **Baseline engine** — Conservative baseline path used for narrow or low-work
  windows. It prioritizes predictable behavior and low fixed overhead.
- **Fused engine** — Optimized for broad candidate sets and any-tracked windows.
  It reduces repeated per-signal work by combining more work in shared passes.
- **Edge-fast engine** — Specialized path for dense edge-trigger workloads. It
  applies edge-first gating and lightweight prefiltering before full delta
  materialization.

Dispatcher heuristics:

- Default mode routes requests automatically from simple workload estimates
  (window size, candidate density, requested signal count, and trigger shape).
- Edge-only trigger profiles can be routed to fused or edge-fast depending on
  estimated work, while sparse profiles remain on baseline.
- This policy is intentionally internal and may evolve without changing user
  payload semantics.

Why this complexity exists:

- A single engine could not deliver consistent low latency for both tiny and
  large-window scenarios.
- The dispatcher keeps externally visible behavior contract-equivalent while
  choosing faster internals for high-work inputs.
- The design target is stable output parity with practical latency improvements
  on large dumps/windows where optimized paths materially reduce runtime.

### 5.8 Testing Strategy

**Levels:**

| Level | What | How | Fixtures |
|-------|------|-----|----------|
| **Unit tests** | Individual functions in `engine/`, `expr/`, `waveform/` | `#[cfg(test)]` modules, `cargo test` | Hand-crafted VCD strings (inline or small `.vcd` files) |
| **Integration tests** | Full CLI invocations end-to-end | `assert_cmd` in `tests/` directory | Hand fixtures + container-provisioned fixtures at `/opt/rtl-artifacts` |
| **Expression tests** | Lexer, parser, evaluator independently | Unit tests in `expr/` submodules | None (pure logic, string inputs) |

**Test fixture strategy (two sources):**

1. **Hand-crafted VCD files** — For unit tests and edge cases. VCD is a simple
   text format, easy to write manually. Covers: empty files, single-bit signals,
   multi-bit signals, X/Z values, multiple scopes, deep hierarchy, time edge cases
   (zero duration, single timestamp). Stored in `tests/fixtures/hand/`.

2. **Container-provisioned representative fixtures** — For integration tests.
   Required large fixtures are downloaded during devcontainer/CI image build from
   a pinned release version and installed under `/opt/rtl-artifacts`.
   Runtime test execution does not download fixtures.

**What to assert in integration tests:**

- Exact stdout output (deterministic output is a design principle)
- Exit code
- Stderr content for error cases
- `--json` output validates against expected JSON structure and `$schema` URL contract
- Human output is generally not asserted for exact formatting unless a command-level contract explicitly fixes it (currently `value`, including `--abs` display behavior)
- Consistency: same query on VCD and FST of the same design produces identical output

---

## 7. Open Questions

1. **Scope/path canonicalization.** What is the canonical path syntax and escaping rules for VCD escaped identifiers and unusual names across formats?
2. **Warnings (codes vs free text).** Do we add stable warning codes for promote/suppress, or keep warnings as free-form strings only?
3. **Value radix options.** Do we add `--radix` (hex/bin/dec/auto) and what is the default policy beyond the Verilog-literal representation?
4. **Schema evolution policy.** Do we keep a single canonical schema file indefinitely, or split into per-command schemas in future milestones?
5. **Signal metadata schema.** Exact JSON fields for `signal` output (`kind`, `width`, and other metadata) and how they map across formats.
6. **GHW support scope.** Should GHW be added after MVP, and if yes, what acceptance criteria and priority should gate its introduction?

---

## 8. References
- `docs/expression_lang.md` — canonical event/logical expression contract
- [GTKWave]http://gtkwave.sourceforge.net/ — reference waveform viewer
- [Surfer]https://surfer-project.org/ — modern waveform viewer; reference implementation context for the parsing stack
- [VCD Format Specification]https://en.wikipedia.org/wiki/Value_change_dump
- [FST Format]https://gtkwave.sourceforge.net/gtkwave.pdf
- [wellen (GitHub)]https://github.com/ekiwi/wellen — Rust waveform parsing library used by wavepeek
- [wellen (docs.rs)]https://docs.rs/wellen/0.20.2 — API documentation