hen 0.16.0

Run protocol-aware API request collections from the command line or through MCP.
Documentation
# Detailed Roadmap: Debugging and Failure Forensics

This document holds the detailed planning for Phase 4 from [ROADMAP.md](ROADMAP.md).

## Current Phase 4 State

Hen already ships the first layer of debugging and failure-forensics support:

- Per-request `startedAtUnixMs` and `durationMs` fields in structured output.
- CLI body previews plus elapsed and slowest-request summaries.
- Interruption-aware partial reporting for incomplete runs.
- Structured assertion mismatch payloads with exact paths, kinds, and expected or actual detail.
- A normalized execution artifact that already has transcript and retained-artifact slots internally.
- Structured JSON and NDJSON record output now includes per-record `transcripts` and `retainedArtifacts`, and the existing body-report controls apply to those artifact bodies as well.
- Structured JSON and NDJSON now also include a stable per-request `timing` object with `totalMs` plus named phase timings where the executor can currently measure them, including HTTP `dns` / `responseStart` / `bodyRead` for hostname-based requests, WebSocket `handshake` / `send` / `wait`, and SSE `dns` / `streamOpen` / `wait` for hostname-based opens.
- Text CLI failures plus structured JSON and NDJSON assertion results now include additive unified diffs for comparison, match, and schema failures, with unchanged-context and changed-block truncation so larger payload diffs stay readable.
- Multi-step run output now includes a structured execution trace that captures dependency waits, start order, skips, completion or failure, interruption state, and session-backed protocol context in JSON, NDJSON, and verbose text output.

That means Phase 4 is complete at the current product scope. The previously proposed follow-on work around file-backed artifact persistence, deeper transport timing breakdowns, and graph or visualization surfaces is intentionally out of scope and will not be pursued.

## Forensics Design Principles

- Reuse the normalized execution artifact instead of creating a separate debugging-only data path.
- Keep forensics transport-neutral even when some timing or trace fields are protocol-specific.
- Make retention, export, and redaction explicit so CI artifacts stay reviewable and safe.
- Prefer machine-readable outputs that the CLI, MCP surface, and future editor tooling can all consume.
- Treat human-readable text output as a projection of structured trace data rather than a one-off rendering path.
- Keep report-schema evolution additive where possible so existing JSON and NDJSON consumers can ignore new forensics fields safely.

## Milestone 4.1: Artifact and Transcript Export

Milestone 4.1 is complete for the current Phase 4 scope: per-record structured output exports the existing in-memory transcripts and retained-artifact slots, and execution failures preserve those artifact-backed transcripts when the executor already has them. File-backed persistence and retention policy work were considered, but they are intentionally out of scope.

- Keep inline artifact and transcript reporting transport-neutral so HTTP, MCP, SSE, and WebSocket all use the same reporting shape.
- Preserve artifact-backed context in structured outputs without requiring custom shell wrappers or ad hoc debug logging.
- Leave file-backed retention and export out of scope for the core CLI roadmap.

### Milestone 4.1 Implementation Anchors

- Execution artifact model in `src/request/artifact.rs`.
- Request execution integration in `src/request/executor.rs` and `src/request/mod.rs`.
- Structured reporting in `src/report.rs`.
- CLI flag wiring and output behavior in `src/main.rs`.

### Milestone 4.1 Dependencies and Risks

- Inline artifact reporting can still bloat machine-readable output if callers ask for large bodies.
- Protocol-specific transcript attributes still need to fit the shared report shape instead of forcing a report-schema fork.
- The explicit tradeoff is that Hen will not add a second file-retention subsystem for Phase 4.

### Milestone 4.1 Exit Check

Met for the current scope: a failed run can emit inline request and response artifacts plus transcripts without requiring custom shell wrappers or ad hoc debug logging.

## Milestone 4.2: Timing Breakdown and Performance Context

Milestone 4.2 is complete for the current Phase 4 scope: Hen records a stable machine-readable timing shape with `totalMs` and executor-level phase timings where they are directly measurable. Deeper transport timing decomposition is intentionally out of scope.

- Preserve the current stable fallback shape when a protocol cannot provide detailed phase timing.
- Keep the current verbose-text timing projection aligned with the shared structured timing data.
- Keep session-backed wait behavior understandable for streaming and multi-step protocols.

### Milestone 4.2 Implementation Anchors

- Request execution lifecycle in `src/request/executor.rs`.
- Run timing capture in `src/request/runner.rs`.
- Structured reporting in `src/report.rs`.
- Text summary rendering in `src/main.rs`.

### Milestone 4.2 Dependencies and Risks

- Different transports still expose timing detail with different fidelity, and the current executor-level phases intentionally do not separate connect or TLS.
- Phase timings can still be misleading if future queueing, retries, or session waits are mixed into transport timings without clear labels.

### Milestone 4.2 Exit Check

Met for the current scope: users can see more than a single duration number when diagnosing slow requests, and the timing shape remains understandable across protocols without adding transport-specific subphase complexity.

## Milestone 4.3: Dependency-Aware Execution Trace

Milestone 4.3 is complete for the current Phase 4 scope: Hen now emits an additive execution trace surface for multi-step runs, built directly from runner events and reused across text, JSON, and NDJSON reporting.

- Structured JSON output now includes a `trace` array, and NDJSON output now emits typed `trace` lines alongside the existing summary, record, and failure entries.
- The trace records causal ordering for dependency waits, request start, completion, failure, dependency-driven skips, map aborts, and interruption state.
- Verbose text runs now project the same trace data into a readable breadcrumb section without changing the default non-verbose output noise level.
- Session-backed protocol context is preserved in trace entries so streaming and multi-step flows remain understandable without raw logs.

### Milestone 4.3 Implementation Anchors

- Dependency graph and ordering in `src/request/planner.rs`.
- Execution events and failure aggregation in `src/request/runner.rs`.
- Automation-facing run summaries in `src/automation.rs`.
- Structured and text reporting in `src/report.rs` and `src/main.rs`.

### Milestone 4.3 Dependencies and Risks

- The main remaining risk is keeping future retry or polling work additive to the current trace model instead of forking it into a second orchestration view.
- Parallel execution now uses a stable sequence-based trace ordering, so follow-on work should extend the same model rather than inventing protocol-specific trace formats.

### Milestone 4.3 Exit Check

Met: multi-step runs now expose a clear execution breadcrumb trail that explains dependency skips, ordering, interruption state, and session-backed flows without reading raw logs.

## Milestone 4.4: Diff-Friendly Failure Presentation

Milestone 4.4 is complete for the current Phase 4 scope: Hen preserves the existing path-aware mismatch details and now also emits readable unified diffs for comparison, match, and schema failures in text, JSON, and NDJSON output.

- Human-readable diffs now cover larger text and JSON mismatches, including schema-backed failures that previously only surfaced mismatch metadata.
- The existing mismatch model remains the primary structured signal, and the additive diff fields complement rather than replace it.
- The current diff renderer already truncates unchanged context and oversized changed blocks so large payload output stays readable.
- The remaining follow-up for this area is Phase 5 snapshot reuse rather than a separate unfinished Phase 4 feature gap.

### Milestone 4.4 Implementation Anchors

- Assertion mismatch generation in `src/request/assertion.rs`.
- Response-path resolution in `src/request/response_capture.rs`.
- Structured and text reporting in `src/report.rs` and `src/main.rs`.

### Milestone 4.4 Dependencies and Risks

- The main remaining risk is divergence between the shipped Phase 4 diff renderer and the snapshot comparison work planned in Phase 5.
- If Phase 5 grows a second comparison path instead of reusing the current primitives, the output surface will drift and become harder to maintain.

### Milestone 4.4 Exit Check

Met: large payload failures now explain what changed in a readable diff without losing the current structured mismatch details needed by automation.

## Remaining Delivery Order Within Phase 4

None. Phase 4 is complete at the current scope, and the previously proposed artifact-persistence, deeper transport timing, and graph or visualization follow-ons are out of plan.

## Phase 4 Non-Goals

- File-backed artifact persistence or MCP resource export beyond the current inline report surface.
- Transport-specific deep timing breakdowns such as connect, TLS, or first-byte subphases.
- Request-graph or other visualization surfaces.
- A second debugging subsystem that bypasses the shared execution artifact and report pipeline.
- Always-on full artifact retention for every run.
- Protocol-specific trace formats that fragment CLI and MCP behavior.
- A heavy graphical UI as part of the core CLI.