hen 0.15.0

Run protocol-aware API request collections from the command line or through MCP.
Documentation
# Detailed Roadmap: Debugging and Failure Forensics

This document holds the detailed planning for Phase 4 from [ROADMAP.md](ROADMAP.md).

## Current Phase 4 State

Hen already ships the first layer of debugging and failure-forensics support:

- Per-request `startedAtUnixMs` and `durationMs` fields in structured output.
- CLI body previews plus elapsed and slowest-request summaries.
- Interruption-aware partial reporting for incomplete runs.
- Structured assertion mismatch payloads with exact paths, kinds, and expected or actual detail.
- A normalized execution artifact that already has transcript and retained-artifact slots internally.
- Structured JSON and NDJSON record output now includes per-record `transcripts` and `retainedArtifacts`, and the existing body-report controls apply to those artifact bodies as well.
- Structured JSON and NDJSON now also include a stable per-request `timing` object with `totalMs` plus named phase timings where the executor can currently measure them, including HTTP `dns` / `responseStart` / `bodyRead` for hostname-based requests, WebSocket `handshake` / `send` / `wait`, and SSE `dns` / `streamOpen` / `wait` for hostname-based opens.

That means Phase 4 is no longer about inventing basic failure reporting. The remaining work is to surface deeper traces, retained artifacts, timing breakdowns, diffs, and multi-step execution views in a way that stays transport-neutral.

## Forensics Design Principles

- Reuse the normalized execution artifact instead of creating a separate debugging-only data path.
- Keep forensics transport-neutral even when some timing or trace fields are protocol-specific.
- Make retention, export, and redaction explicit so CI artifacts stay reviewable and safe.
- Prefer machine-readable outputs that the CLI, MCP surface, and future editor tooling can all consume.
- Treat human-readable text output as a projection of structured trace data rather than a one-off rendering path.
- Keep report-schema evolution additive where possible so existing JSON and NDJSON consumers can ignore new forensics fields safely.

## Milestone 4.1: Artifact and Transcript Export

The first Phase 4 slice is now in place: per-record structured output exports the existing in-memory transcripts and retained-artifact slots, and execution failures now preserve those artifact-backed transcripts when the executor already has them. The remaining gap is persistence, redaction, and optional file-backed retention.

- Define how retained artifacts move from inline report fields into optional file exports and MCP resource handles.
- Add explicit truncation, redaction, and opt-in retention controls so large payloads and secrets do not leak by default.
- Keep artifact export transport-neutral so HTTP, MCP, SSE, and WebSocket all use the same reporting shape.

### Milestone 4.1 Implementation Anchors

- Execution artifact model in `src/request/artifact.rs`.
- Request execution integration in `src/request/executor.rs` and `src/request/mod.rs`.
- Structured reporting in `src/report.rs`.
- CLI flag wiring and output behavior in `src/main.rs`.

### Milestone 4.1 Dependencies and Risks

- Artifact retention needs redaction before it is safe to encourage in CI.
- Transcript export can bloat machine-readable output if retention and redaction policy are not clear beyond the current shared truncation controls.
- Protocol-specific transcript attributes should not force a report-schema fork.

### Milestone 4.1 Exit Check

A failed run can emit retained request and response artifacts plus transcripts without requiring custom shell wrappers or ad hoc debug logging.

## Milestone 4.2: Timing Breakdown and Performance Context

Hen now records a stable machine-readable timing shape with `totalMs` and executor-level phase timings where they are directly measurable. The remaining work is deeper transport timing where the underlying client stack can expose more detail.

- Extend timing breakdowns beyond the current DNS and executor-level phases toward connect, TLS, first byte, and other lower-level transport phases where available.
- Preserve the current stable fallback shape when a protocol cannot provide the full timing breakdown.
- Decide what subset of the structured timing data should also appear in text summaries.
- Keep session-backed wait behavior understandable for streaming and multi-step protocols.

### Milestone 4.2 Implementation Anchors

- Request execution lifecycle in `src/request/executor.rs`.
- Run timing capture in `src/request/runner.rs`.
- Structured reporting in `src/report.rs`.
- Text summary rendering in `src/main.rs`.

### Milestone 4.2 Dependencies and Risks

- Different transports expose timing detail with different fidelity, and the current executor-level phase timings still do not separate connect or TLS.
- Phase timings can be misleading if queueing, retries, or session waits are mixed into transport timings without clear labels.

### Milestone 4.2 Exit Check

Users can see more than a single duration number when diagnosing slow requests, and the timing shape remains understandable across protocols even before transport-level subphases are fully available.

## Milestone 4.3: Dependency-Aware Execution Trace

Per-request summaries are useful, but multi-step collections still need a clearer breadcrumb trail through dependencies, skips, sessions, and interruptions.

- Add dependency-aware execution trace output for multi-step runs.
- Show why a request ran, skipped, or waited, including dependency and session relationships.
- Preserve interruption and partial-run state in the same trace surface.
- Reuse runner events and planner metadata rather than building a second orchestration model.

### Milestone 4.3 Implementation Anchors

- Dependency graph and ordering in `src/request/planner.rs`.
- Execution events and failure aggregation in `src/request/runner.rs`.
- Automation-facing run summaries in `src/automation.rs`.
- Structured and text reporting in `src/report.rs` and `src/main.rs`.

### Milestone 4.3 Dependencies and Risks

- Trace output can become noisy if it repeats data already visible in per-request records.
- Parallel execution needs a stable trace model that still preserves causal ordering.

### Milestone 4.3 Exit Check

Multi-step runs expose a clear execution breadcrumb trail that explains dependency skips, ordering, interruption state, and session-backed flows without reading raw logs.

## Milestone 4.4: Diff-Friendly Failure Presentation

Hen already reports path-aware mismatch details. The next step is readable diffs for larger payloads and broad comparisons.

- Add human-readable diffs for larger text and JSON mismatches.
- Reuse the existing mismatch model so diff output complements, rather than replaces, structured failure data.
- Keep the diff engine compatible with the snapshot work planned in Phase 5.
- Avoid turning ordinary assertions into unreadable fixture dumps in text output.

### Milestone 4.4 Implementation Anchors

- Assertion mismatch generation in `src/request/assertion.rs`.
- Response-path resolution in `src/request/response_capture.rs`.
- Structured and text reporting in `src/report.rs` and `src/main.rs`.

### Milestone 4.4 Dependencies and Risks

- Diff output can become noisy without truncation, normalization, or path-focused context.
- Phase 4 diff work and Phase 5 snapshot work should share the same comparison and rendering primitives.

### Milestone 4.4 Exit Check

Large payload failures explain what changed in a readable diff without losing the current structured mismatch details needed by automation.

## Milestone 4.5: Graph and Visualization Surfaces

The roadmap already calls out request-graph visualization. The remaining question is how much of that belongs in the CLI versus machine-readable exports consumed by other tools.

- Export the request dependency graph in a stable machine-readable form.
- Add a lightweight visualization path such as DOT, Mermaid, or another reviewable text format.
- Support overlays for execution outcome, timing, and dependency state when practical.
- Keep editor-focused rendering separate from the core graph export so Phase 7 can build on the same data.

### Milestone 4.5 Implementation Anchors

- Graph construction in `src/request/planner.rs`.
- Run and verification summaries in `src/automation.rs`.
- CLI and MCP delivery surfaces in `src/main.rs` and `src/mcp.rs`.

### Milestone 4.5 Dependencies and Risks

- A rendered graph format can become a maintenance burden if there is no stable underlying graph export.
- Visualization should not require the CLI to take on a large embedded UI surface.

### Milestone 4.5 Exit Check

Users can inspect a collection's dependency graph and understand request relationships without reverse-engineering the plan from plain text output.

## Suggested Delivery Order Within Phase 4

1. Artifact and transcript export.
2. Dependency-aware execution trace.
3. Timing breakdown and performance context.
4. Diff-friendly failure presentation.
5. Graph and visualization surfaces.

This order keeps the early work focused on exporting existing internal data before adding new rendering and visualization layers.

## Phase 4 Non-Goals

- A second debugging subsystem that bypasses the shared execution artifact and report pipeline.
- Always-on full artifact retention for every run.
- Protocol-specific trace formats that fragment CLI and MCP behavior.
- A heavy graphical UI as part of the core CLI.