# Modularization Plan
This document is a concrete plan to break up Hen's oversized source files into smaller submodules without changing the public DSL or execution behavior.
## Progress Update
Completed so far:
- Wave 1 is complete. The former `src/report.rs` has been split into `src/report/`, the CLI flow has been split out of `src/main.rs` into focused `cli_*` modules, and `tests/output_cli.rs` is now a thin root with grouped modules under `tests/output_cli/`.
- The parser facade work is now complete enough that both `src/parser/mod.rs` and `src/parser/request.rs` are thin entrypoints again at 31 and 35 lines respectively. `parse_collection` now lives in `src/parser/collection.rs`, syntax inspection and `SyntaxSummary` types live in `src/parser/syntax.rs`, parser error-span adapters live in `src/parser/spans.rs`, and parser unit tests live in `src/parser/tests.rs`.
- `src/parser/collection.rs` is no longer a parser hotspot: request-template construction moved out, leaving collection-level orchestration at 148 lines.
- Parser helper extraction is already well underway: declaration handling, legacy header normalization, variable helpers, protocol helpers, and request authoring, scan, validation, and template helpers now live in dedicated sibling modules.
Still open:
- The original Wave 2 request-leaf work remains largely untouched: `src/request/mod.rs`, `src/request/template.rs`, and `src/request/response_capture.rs` are still major hotspots.
- Wave 3 and Wave 4 remain open: `src/schema/mod.rs`, `src/request/assertion.rs`, `src/request/executor.rs`, and `src/request/runner.rs` are still the largest core implementation files.
- Parser follow-up cleanup is smaller and more local now, but not fully done: `src/parser/request/template.rs`, `src/parser/request/validate.rs`, `src/parser/request/scan.rs`, and `src/parser/tests.rs` are the remaining parser-adjacent files most likely to need another split.
## Why This Exists
The current size problem is concentrated in a small set of core modules.
As of May 2026 after Wave 1 and the parser/request facade breakup, some of the notable remaining Rust hotspots are:
| File | Lines | Primary responsibilities mixed together today |
| --- | ---: | --- |
| `src/request/assertion.rs` | 2824 | assertion AST, parsing, typed evaluation, structural matching, schema matching, mismatch formatting |
| `src/request/executor.rs` | 2573 | HTTP execution, GraphQL, MCP, SSE, WebSocket, multipart encoding, curl rendering, timing |
| `src/schema/mod.rs` | 2181 | schema model, registry, cycle detection, scalar validation, object validation, error rendering |
| `src/request/response_capture.rs` | 1930 | capture model, path parsing, JSON traversal, filtered selectors, decoded JSON, metadata extraction |
| `src/request/template.rs` | 1305 | template model, expansion, fragment includes, map iteration, dependency resolution |
| `src/request/mod.rs` | 1233 | request model types, protocol models, callback helpers, map helpers, protocol-context rendering |
| `src/parser/tests.rs` | 988 | parser grammar, collection-parse, syntax-inspection, and declaration-registry regression coverage |
| `src/request/runner.rs` | 970 | orchestration, events, failure aggregation, dependency context |
| `src/parser/declarations.rs` | 579 | declaration parsing, declaration-span indexing, and schema-registry validation wiring |
| `src/parser/protocol.rs` | 552 | protocol inference, session-target compatibility, and protocol-context JSON assembly |
| `src/parser/request/template.rs` | 530 | request-template construction and protocol-specific template operation building |
| `src/parser/request/validate.rs` | 426 | GraphQL, MCP, SSE, and WebSocket directive or action validation |
| `src/parser/syntax.rs` | 422 | syntax inspection and syntax-summary assembly |
| `src/parser/request/scan.rs` | 403 | raw request-block scanning and scanned-request state construction |
The hotspot list is now more concentrated than when this plan started. `src/report.rs` no longer exists, `src/main.rs` is down to 34 lines, `tests/output_cli.rs` is down to a 15-line root, `src/parser/mod.rs` is down to 31 lines, `src/parser/request.rs` is down to 35 lines, and `src/parser/collection.rs` is down to 148 lines. The biggest remaining problem is now concentrated in the request, assertion, schema, executor, and response-capture layers, with smaller parser follow-up cleanup now living inside the extracted request submodules rather than the root parser files.
## Goals
- Keep behavior stable while reducing review and maintenance cost.
- Replace giant single-file modules with directory modules and thin facades.
- Split by responsibility, not by arbitrary line count.
- Make `mod.rs` files mostly re-exports, shared types, and top-level entry points.
- Create narrower test surfaces so changes do not require understanding a 2k to 4k line file.
## Guardrails
- Do not mix feature work into the breakup.
- Do not change the public `.hen` authoring surface as part of these refactors.
- Do not create a generic `util.rs` dumping ground.
- Prefer `pub(crate)` helpers over widening visibility just to make file moves easier.
- Move leaf helpers first, then move public types and entry points once the new structure is stable.
Recommended size targets:
- `mod.rs`: under 250 lines where practical.
- Ordinary implementation modules: under 400 to 600 lines where practical.
- Integration test files: under 500 to 700 lines where practical.
These are guardrails, not hard laws. The real objective is coherent ownership.
## Target Breakdown
### 1. Parser
Current status: partially complete. `src/parser/mod.rs` and `src/parser/request.rs` are now thin facades again, and the parser already has dedicated `syntax.rs`, `collection.rs`, `spans.rs`, `protocol.rs`, `declarations.rs`, `variables.rs`, `legacy_header.rs`, and `request/{authoring,scan,template,validate}.rs` modules. Remaining parser work is now concentrated in `src/parser/request/template.rs`, `src/parser/request/validate.rs`, `src/parser/request/scan.rs`, and `src/parser/tests.rs` rather than in the root modules.
Target shape:
```text
src/parser/
mod.rs
syntax.rs
collection.rs
request.rs
request/
authoring.rs
scan.rs
template.rs
validate.rs
protocol.rs
declarations.rs
variables.rs
spans.rs
legacy_header.rs
context.rs
preprocessor.rs
```
Suggested ownership:
- `syntax.rs`: `inspect_collection_syntax`, `inspect_request_syntax`, `SyntaxSummary` types.
- `collection.rs`: `parse_collection`, top-level preamble and request orchestration.
- `request.rs`: thin facade and re-export surface for request parsing helpers.
- `request/authoring.rs`: response-capture parsing, assertion parsing, and fragment-guard validation.
- `request/scan.rs`: `ScannedRequest`, raw request-block scanning, and fragment or dependency parsing.
- `request/template.rs`: `parse_request_template`, request-target resolution wiring, and protocol-specific template construction.
- `request/validate.rs`: directive compatibility plus GraphQL, MCP, SSE, and WebSocket validation enums and helpers.
- `protocol.rs`: protocol inference, session target compatibility, protocol-context JSON helpers, `within` parsing hooks.
- `declarations.rs`: `scalar` and `schema` parsing plus schema-registry validation wiring.
- `variables.rs`: prompt placeholders, shell variables, array literal handling, validation-time assignment helpers.
- `spans.rs`: `SourceSpan`, declaration span tracking, parse-to-source error mapping.
- `legacy_header.rs`: legacy collection header normalization.
`src/parser/mod.rs` and `src/parser/request.rs` are now acting as those facades. Remaining parser work is follow-up breakup inside the extracted siblings, not another large root-module reduction.
### 2. Request Assertions
Current problem: `src/request/assertion.rs` combines syntax parsing, AST types, typed comparison logic, schema-backed comparison, structural JSON matching, and mismatch rendering.
Target shape:
```text
src/request/assertion/
mod.rs
ast.rs
parse.rs
evaluate.rs
structural.rs
schema.rs
mismatch.rs
guard.rs
```
Suggested ownership:
- `ast.rs`: `Assertion`, `Guard`, operand and comparison enums.
- `parse.rs`: operator splitting, literal normalization, schema-target parsing.
- `guard.rs`: guard prefix extraction and guard normalization rules.
- `evaluate.rs`: typed equality and ordering evaluation.
- `structural.rs`: `~=` structural JSON matching.
- `schema.rs`: `===` handling and schema-specific mismatch mapping.
- `mismatch.rs`: mismatch data types, builders, and human-readable formatting.
### 3. Request Executor
Current problem: `src/request/executor.rs` is trying to be a transport-neutral executor, HTTP client implementation, GraphQL materializer, MCP materializer, SSE runtime, WebSocket runtime, curl exporter, multipart encoder, and timing system all at once.
Target shape:
```text
src/request/executor/
mod.rs
http.rs
graphql.rs
mcp.rs
sse.rs
ws.rs
multipart.rs
curl.rs
timing.rs
headers.rs
```
Suggested ownership:
- `http.rs`: base reqwest execution and normalized response construction.
- `timing.rs`: DNS timing recorder and transport timing helpers.
- `graphql.rs`: GraphQL request resolution and HTTP materialization.
- `mcp.rs`: MCP request materialization, session hydration, JSON-RPC response normalization.
- `sse.rs`: SSE parser, open and receive execution, session metadata.
- `ws.rs`: handshake, send, exchange, receive, and message normalization.
- `multipart.rs`: local multipart body generation and boundary rules.
- `curl.rs`: curl export rendering.
- `headers.rs`: JSON and header-map conversion helpers.
`mod.rs` should dispatch by request operation and own only the thin shared execution contract.
### 4. Schema System
Current problem: `src/schema/mod.rs` mixes schema model types, registry ownership, dependency validation, scalar evaluation, object validation, and error rendering.
Target shape:
```text
src/schema/
mod.rs
model.rs
registry.rs
scalar.rs
object.rs
errors.rs
render.rs
deps.rs
```
Suggested ownership:
- `model.rs`: scalar and schema declaration types.
- `registry.rs`: `SchemaRegistry`, reserved-name rules, entry lookup.
- `deps.rs`: reference validation and cycle detection.
- `scalar.rs`: built-in and declared-scalar validation.
- `object.rs`: object, root-array, field-presence, and recursive validation.
- `errors.rs`: validation error types.
- `render.rs`: human-readable labels and failure-text helpers.
### 5. Response Capture System
Current problem: `src/request/response_capture.rs` mixes capture AST, response snapshots, path parsing, JSON traversal, filtered selectors, decoded JSON wrappers, and SSE or WebSocket metadata extraction.
Target shape:
```text
src/request/response_capture/
mod.rs
model.rs
snapshot.rs
parse.rs
resolve.rs
filter.rs
decoded_json.rs
metadata.rs
```
Suggested ownership:
- `model.rs`: capture source and target enums plus accessor types.
- `snapshot.rs`: `ResponseSnapshot` and snapshot construction.
- `parse.rs`: operand parsing and path parsing.
- `resolve.rs`: capture resolution and JSON traversal.
- `filter.rs`: selector parsing and selector evaluation.
- `decoded_json.rs`: `json(...)` wrapper handling.
- `metadata.rs`: status, header, SSE, and WebSocket metadata accessors.
### 6. Reporting
Status: completed. The former `src/report.rs` has already been split into `src/report/{mod,json,ndjson,junit,common,artifacts,text,tests}.rs`, and `src/report/mod.rs` is now a small facade.
Target shape:
```text
src/report/
mod.rs
json.rs
ndjson.rs
junit.rs
common.rs
artifacts.rs
text.rs
```
Suggested ownership:
- `json.rs`: top-level JSON report entry points.
- `ndjson.rs`: NDJSON entry points and event shaping.
- `junit.rs`: JUnit suite and case rendering.
- `artifacts.rs`: transcript, retained-artifact, and timing serialization.
- `common.rs`: shared assertion and failure JSON helpers.
- `text.rs`: only shared text-description helpers that do not belong in `src/main.rs`.
### 7. CLI Entry Point
Status: completed. `src/main.rs` is now a thin wiring layer, and the former giant CLI flow has already been split into `src/{cli,cli_run,cli_verify,cli_load,cli_output,cli_interrupt}.rs`.
Target shape:
```text
src/
main.rs
cli.rs
cli_run.rs
cli_verify.rs
cli_load.rs
cli_output.rs
cli_interrupt.rs
```
Suggested ownership:
- `cli.rs`: clap argument structs and invocation parsing.
- `cli_run.rs`: run command flow.
- `cli_verify.rs`: verify command flow.
- `cli_load.rs`: collection and request selection helpers.
- `cli_output.rs`: text and machine-readable printing.
- `cli_interrupt.rs`: signal handling and tracking observer setup.
`main.rs` should become thin program wiring.
### 8. Request Model and Template Layers
Current problem: `src/request/mod.rs` still owns too many core model and helper responsibilities, while `src/request/template.rs` mixes template data types with expansion and dependency resolution.
Target shape:
```text
src/request/
mod.rs
model.rs
map.rs
callbacks.rs
protocol_context.rs
template/
mod.rs
model.rs
expand.rs
fragments.rs
map.rs
resolve.rs
dependencies.rs
```
Suggested ownership:
- `model.rs`: `Request`, `RequestOperation`, protocol operation types.
- `map.rs`: map iteration structures and export suffixing.
- `callbacks.rs`: callback assignment parsing and sanitization.
- `protocol_context.rs`: GraphQL, MCP, SSE, and WebSocket protocol-context rendering.
- `template/model.rs`: template-layer operation types.
- `template/expand.rs`: request expansion pipeline.
- `template/fragments.rs`: fragment include handling.
- `template/map.rs`: array variable detection and iteration context building.
- `template/resolve.rs`: HTTP or protocol template materialization.
- `template/dependencies.rs`: dependency linking.
The important rule here is that `src/request/mod.rs` should stop being a second giant implementation file and become the public facade for request-domain modules.
### 9. Runner
Current problem: `src/request/runner.rs` is smaller than the top-tier hotspots, but it is already carrying events, execution options, failure types, and orchestration logic in one place.
Target shape:
```text
src/request/runner/
mod.rs
events.rs
failures.rs
execute.rs
context.rs
```
Suggested ownership:
- `events.rs`: `ExecutionEvent`, observer types.
- `failures.rs`: failure structs and failure classification.
- `context.rs`: dependency-context construction and shared execution helpers.
- `execute.rs`: sequential and parallel plan execution.
## Integration Test Breakup
`tests/output_cli.rs` was an obvious mega-file earlier in the project, and this breakup is now complete enough in practice: `tests/output_cli.rs` is a thin root and the real coverage lives under `tests/output_cli/`.
Recommended split:
```text
tests/output_cli/
mod.rs
json.rs
ndjson.rs
junit.rs
text.rs
protocols.rs
fixtures.rs
```
`tests/verify_cli.rs` is still the remaining parser-adjacent integration test candidate if parser work keeps expanding it.
## Delivery Status
The original recommendation was to avoid starting with `src/parser/mod.rs`. In practice, Wave 1 completed first and then clear parser-local seams made it reasonable to pull parser facade work forward before the request, schema, and executor waves were finished. The status below reflects the current repository state rather than a pristine greenfield sequence.
### Wave 1: Lowest-risk output surfaces — completed
1. Split `src/report.rs` into `src/report/`.
2. Split `src/main.rs` into CLI helper modules.
3. Split `tests/output_cli.rs` into a directory test crate.
Why first: these changes are easier to validate, do not change the DSL, and establish the repo pattern for directory modules.
### Wave 2: Request leaf systems — not started yet
1. Split `src/request/mod.rs` into model and helper files.
2. Split `src/request/template.rs` into `src/request/template/`.
3. Split `src/request/response_capture.rs` into `src/request/response_capture/`.
Why second: these modules already have natural seams and are upstream of parser and assertions.
### Wave 3: Validation core — not started yet
1. Split `src/schema/mod.rs`.
2. Split `src/request/assertion.rs`.
Why third: assertions depend on schema and response-capture behavior. Splitting schema first reduces the amount of cross-file churn inside the assertion refactor.
### Wave 4: Runtime execution core — not started yet
1. Split `src/request/executor.rs`.
2. Split `src/request/runner.rs`.
Why fourth: executor is complex but becomes much easier to refactor once request models and template resolution are already separated.
### Wave 5: Parser — partially complete
1. Split `src/parser/mod.rs`. Completed.
2. Shrink `src/parser/collection.rs` by separating collection orchestration from request-template construction. Completed.
3. Split `src/parser/request.rs` into a thin facade over request authoring, scan, template, and validate helpers. Completed.
4. Split `tests/verify_cli.rs` if it grows during parser refactors. Not started.
Remaining parser-local follow-up after the facade work:
- decide whether `src/parser/request/template.rs` should split further by protocol family
- decide whether `src/parser/request/validate.rs` or `src/parser/request/scan.rs` should split further by concern
- decide whether `src/parser/tests.rs` and `tests/verify_cli.rs` need their own grouped submodules
Why last: parser touches request templates, schema declarations, syntax inspection, prompt validation, and source-span rewrites. The risky root-level parser breakup is now mostly done; the remaining work is smaller follow-up extraction inside parser leaf modules.
## Per-Wave Validation
Every wave should be behavior-preserving. The default validation remains:
```bash
cargo test
```
Additional focused validation by area:
- report or CLI splits: `cargo test output_cli` and `cargo test format_timing_line --bin hen`
- parser splits: `cargo test parser::tests:: --lib`, `cargo test verify_cli`, and full `cargo test`
- request or executor splits: `cargo test request::` plus the relevant protocol-focused tests
- schema or assertion splits: focused schema and assertion tests before the full suite
No wave should include feature changes unless a refactor uncovers a concrete bug that blocks the move.
## End State
This plan is complete when:
- core directories such as `parser`, `request`, `schema`, and `report` are organized around submodules instead of giant files
- `mod.rs` files act as facades instead of implementation dumps
- new work naturally lands in focused modules rather than growing another 2k to 4k line file
- integration tests are grouped by behavior instead of accumulating in single giant crates
The goal is not to make every file tiny. The goal is to make ownership obvious.