# HRML Rework Plan
Pickup notes for whoever continues this. Read end-to-end before touching code.
## North star
The framework manipulates a **single labelled, ranked tree** (the IR). Every
operation — parsing, rendering, conversion between syntaxes, variable
resolution, layout block injection — is a tree algorithm. No raw HTML strings
inside tree nodes. No regex-on-attribute-values for `$var` resolution. With the
tree as the single source of truth, correctness is statable as MSO-style
invariants over node labels and structure, and is property-testable.
Two surface syntaxes serialise the same tree:
- **TRML** (`.trml`) — indentation-based, `[name attrs]` per node. Canonical /
default authoring format going forward.
- **HRML** (`.hrml`) — HTML-like, `<?directive?>` for HRML nodes, `<tag>` for
HTML nodes. Supported as an alternative input format.
Both parsers produce the same `Node` tree. The renderer walks that tree and
emits HTML.
## Where we are (state at start of this work)
Codebase: `/home/pkd/code/hrml`, branch `master`.
- TRML parser (`src/template/parser/tparser.rs`) already produces a fully
structured tree.
- HRML parser (`src/template/parser/hparser.rs`) produces tree nodes **only
for HRML directives**. Raw HTML in source ends up as `Node::Text` blobs.
Example: in `<a href="x"><?get id="y"?></a>`, the parser emits
`[Text("<a href=\"x\">"), Element(get, ...), Text("</a>")]`.
- Renderer (`src/template.rs`) already has fallthrough cases that emit
`Node::Element { name: "div", ... }` as `<div>...</div>` — so the moment the
HRML parser starts producing HTML tree nodes, rendering still works.
- Variable references `$x.y` are resolved by `Engine::resolve()` via byte
scanning on attribute-value strings (see `src/template.rs:1680`). Just fixed
a bug there (dot-paths now work); the whole function should eventually be
replaced with a tree pass.
- Converter (`src/convert.rs`) only handles HRML directives correctly; HTML
inside text nodes degrades to escaped raw strings. Output is unusable for USI.
- USI project (`usi/templates/*.hrml`) is dense mixed HRML+HTML — the main
real-world target.
- 5 compiler warnings fixed; failing test `test_markdown_frontmatter_and_meta_tag_library`
fixed (dot-path resolution bug).
- Stage 1 prep done:
- Nested-quote `<?get id="x"?>` inside `"…"` attribute values lifted to
`$x` form across USI templates and test fixtures.
- `Node::Text` is now resolved against the context — `$var` substitution
applies inside text fragments, not just directive attrs (`src/template.rs:782`).
- `src/generate.rs` now sorts HTML attrs alphabetically when serialising
`Node::Element` / `Node::VoidElement`. Currently dormant (no HTML element
nodes reach the renderer yet); turns deterministic the moment Stage 1
starts producing HTML element nodes.
- `tests/var_resolution_tests.rs` added with four invariants: $-substitution
in text; dotted-path resolution; literal-suffix preservation; *no template
file may contain `<?…?>` inside an attribute value* (regression guard).
Test baseline after current fixes: 155 tests pass.
**Open issue for Stage 1 body.** `tests/template_tests.rs:172` asserts
`<input type="text" name="name">` (attribute order: type, name). After HTML
tokenisation, this becomes `Node::VoidElement` and `generate::void` emits
sorted form (`name="name" type="text"`) plus ` />` (void marker). The fix is
either to update the test to the sorted form, or to preserve insertion order
on the attrs map (swap `HashMap` for `indexmap::IndexMap` or `Vec<(String,String)>`).
Recommend updating the test — sorted output is what we want long-term for
canonical serialisation.
## Plan (three stages, B-order from chat: ship IR unification + USI migration
first, automaton rewrite second).
### Stage 1 — unify the tree IR
**Goal.** HParser produces `Node::Element` / `Node::VoidElement` for HTML tags,
not `Node::Text`. After this stage, no `Node::Text` instance contains a `<` that
opens an HTML tag — text is real text only.
**Why first.** Every downstream operation (converter, renderer, future
automaton) gets simpler once the tree is real. Conversion HRML→TRML becomes a
serialiser, not a string-recovery hack.
**Scope of change.**
1. Extend `hparser::InnerParser` to recognise HTML tags as elements:
- At `parse_node`, dispatch on lookahead:
- `<?` but not `</?` or `<?/` → HRML directive (existing `parse_element`)
- `</?` or `<?/` → HRML closing (existing)
- `<` followed by ASCII alpha → **new** HTML open tag
- `</` followed by ASCII alpha → **new** HTML close (siblings stop here)
- `<!--` → skip comment
- anything else → text
- Add `parse_html_element()` mirroring `parse_element()` but for `<tag>`
syntax: name, attrs (quoted/unquoted, may contain `<?...?>`), then either
`/>` (self-close) or `>` + children + `</name>`.
- Add a void-tag table (`area, base, br, col, embed, hr, img, input, link,
meta, source, track, wbr`). Anything else is a content element with an
explicit closing tag.
2. Attribute-value parser **rejects** embedded `<?…?>` directives — putting
code in quotes is illegal. Any `<?` encountered while reading a `"…"` or
`'…'` attribute value is a parse error pointing at the lift to `$var` form.
(The lift across the repo is done; the parser change makes regressions
loud.) `$var` substitution happens at render time via `Engine::resolve`.
3. Renderer should already work — fallthrough at `template.rs:904` and
`template.rs:1026` emits unknown names as raw HTML. Audit that this still
round-trips correctly (especially self-closing void tags like
`<img src="…"/>`).
4. Update `convert.rs` to walk the now-fully-structured tree and emit TRML
directly. `try_html_to_trml` and its helpers (`parse_html_open_tag`,
`find_closing`) get deleted — they were workarounds for the missing tree
structure.
**Invariants to assert (property tests).**
- *Parser totality.* For every `.hrml` template in the repo, `HParser::parse`
returns `Ok`.
- *No-raw-HTML.* For the same templates, no `Node::Text` value contains `<x`
for any ASCII alpha `x` at byte boundary (i.e., text never holds a tag-open
pattern).
- *Render stability.* For every existing template test, the rendered HTML
(after whitespace normalisation) is byte-identical before and after the
parser change. This is the regression gate.
**Verification.**
```
cargo test # all green
**Files touched (estimate).**
- `src/template/parser/hparser.rs` — ~150 lines added (HTML branch + void
table + attribute-value HRML skip).
- `src/template.rs` — possibly minor (audit render fallthrough).
- `src/convert.rs` — replace string-recovery logic with tree serialisation.
- `tests/` — add `parser_no_raw_html.rs` (the invariants above).
**Risk and rollback.** All changes are additive; if a USI template breaks,
the symptom will be a render-stability test failing. Bisect on the new test.
### Stage 2 — ship TRML default + convert USI
**Goal.** TRML is the canonical/default extension. USI is fully ported to
`.trml`. The `xrml` CLI still reads `.hrml` (back-compat) but new projects
emit `.trml`.
**Depends on.** Stage 1 (otherwise the converter produces garbage).
**Scope of change.**
1. `src/template.rs:177` `parse_with_extension` — default to TRML parser.
`.hrml` keeps the HRML parser. Anything else → TRML.
2. `src/router.rs:29` — accept both extensions. If both `foo.trml` and
`foo.hrml` exist, prefer `.trml` (or warn). Same for `template_to_url` which
strips the extension.
3. `src/ssg.rs:125,157,170` — same: trim either extension; look up `404.trml`
first then `404.hrml`.
4. `src/project.rs:364` — file glob accepts both.
5. `src/assets/mod.rs` — the bundled templates used by `xrml new` (currently
`.hrml`). Convert them to `.trml` and update `include_str!` paths. Keep the
`.hrml` versions out of the binary.
6. **Converter sweep on USI:**
```
for f in usi/templates/**/*.hrml; do
xrml convert "$f" > "${f%.hrml}.trml"
done
```
Then a second pass rewrites `file="…/foo.hrml"` to `file="…/foo.trml"` inside
the produced `.trml` files (load directives, component refs).
The converter itself should do step 2 in-process when it serialises a `load`
or `use file=…` attribute — saves a separate pass.
7. Delete the old `.hrml` files in USI **only after** `cargo test` is still
green and `xrml build usi` produces the same output as before
(diff `usi/dist` before vs. after).
**Verification.**
```
diff -r usi/dist.before/ usi/dist.after/ # empty
cargo test # green
```
**Files touched.**
- `src/template.rs`, `src/router.rs`, `src/ssg.rs`, `src/project.rs`,
`src/assets/mod.rs`, `src/convert.rs`.
- `usi/templates/**/*.hrml` → `*.trml` (mass rename).
- `usi/xrml.toml` — no change expected (paths are extension-agnostic).
**Risk.** Mass rename. Keep the `.hrml` originals around in git history (don't
`git rm` them in the same commit as the conversion — separate commits so a
diff-only review is possible).
### Stage 3 — get/set/resolve as tree automaton
**Goal.** Kill `Engine::resolve()` (string-level `$x` scanning) and the
attribute-value-as-string discipline. Variable references become typed
tree decorations, resolved by a single deterministic pass.
**Depends on.** Stage 1 — needs HRML directives inside attribute values to
be tree nodes, not literal strings.
**Scope of change.**
1. Promote attribute values from `String` to a small enum:
```rust
enum AttrValue {
Literal(String),
Ref(Vec<String>), Mixed(Vec<AttrPart>), }
```
(Or a single `Vec<AttrPart>` always — simpler. Decide once the shape is
driven by test cases.)
2. Both parsers populate this richer type at parse time. No more `$` scanning
later.
3. Replace `Engine::resolve()` with `Context::eval_attr(&AttrValue) -> String`
— a fold over the decorated leaves. Same with `Context::eval_text` for
`$var` inside `Node::Text`.
4. `<?get id="…"?>` becomes equivalent to an `AttrPart::Ref` in a text
position. Same tree, same automaton evaluating it.
**Invariants (property tests).**
- *Resolution monotonicity.* Adding a binding only ever turns `Ref`s into
`Literal`s. No binding mutation ever shortens output (modulo the empty-string
fallback for missing vars).
- *Determinism.* Two passes over the same tree with the same context produce
the same output.
- *Idempotence.* `eval(eval(t))` == `eval(t)` once all refs are bound.
**Verification.**
```
cargo test
# new property test file: tests/automaton_resolve.rs
```
**Files touched.**
- `src/template.rs` (big surgery in `Engine::resolve`, `render_meta_tag`,
`render_link_tag`, every `attrs.get(...).cloned()` site that's followed by a
`$`-resolve).
- `src/template/ast.rs` — `AstNode::Element.attrs` type changes.
- Both parsers — produce `AttrValue` at parse time.
**Risk.** Largest change, touches a lot of files. Save for last, after Stages
1 + 2 are merged and stable.
## Cleanup / release-readiness checklist (independent of stages)
These can be done in parallel with any stage; not blockers, but should land
before "ready for release":
- [x] Fix 5 compiler warnings (done)
- [x] Fix `test_markdown_frontmatter_and_meta_tag_library` (done — was a bug in
`Engine::resolve` where `.` wasn't a valid var-name continuation char)
- [ ] Audit `src/template.rs` (currently 2200 lines): a natural split is
`template/render.rs`, `template/context.rs`, `template/extract.rs`
(the block/component extraction passes), keeping `template.rs` as the
`Engine` struct + public API only.
- [ ] Property tests for the parser invariants (Stage 1) and the resolve
automaton (Stage 3).
- [ ] Delete the orphan dist files in `usi/dist/blog/*.html` referenced in
`git status` (they are build artefacts, not source).
- [ ] `README.md` reflects TRML-first authoring after Stage 2.
## Open questions for whoever picks this up
1. **HRML attribute values containing nested quotes.** `<a href="<?get id="x"?>">`
has nested `"`. Parser strategy: switch on `<?` → read until `?>`, ignoring
inner `"`. Confirm this matches existing USI uses.
2. **HTML void-tag list — strict or permissive?** WHATWG has a fixed list of
void elements. Should `<custom-tag />` self-close be respected too? Default
yes (it's how SVG `<use href="..."/>` works in `usi/templates/components/card.hrml`).
3. **TRML round-trip for whitespace.** `convert.rs` currently collapses
indentation. Decide whether `.trml` output should preserve significant
whitespace from HRML text content (`<p>Hi</p>` vs `<pre> code\n</pre>`).
Probably yes for `<pre>` and `<code>`; investigate at Stage 2 time.
4. **Stage 3 shape: `AttrValue` enum vs. uniform `Vec<AttrPart>`.** Pick by
prototyping on three representative attributes (`href`, `class`, `cond`).
## How to continue
1. Read this file end to end.
2. Read `STAGE` notes above for whichever stage is next.
3. Run baseline `cargo test` — must be green before starting any stage.
4. Run `cargo build 2>&1 | grep warning` — must be empty.
5. Pick the smallest sub-task in the next stage. Property tests *before* code
changes where possible.
6. Update this `PLAN.md` as the source of truth: mark items done, add new
open questions, refine estimates.