# pter Architecture
## Overview
pter converts HTML email bodies into readable markdown. It takes an HTML string and returns a markdown string. It does not handle MIME parsing, content extraction, or markdown rendering.
## Pipeline
```
html: &str
→ scraper::Html::parse_document() # html5ever DOM tree
→ walk_children(root) # depth-first traversal
→ handle_text() # whitespace collapsing, entity decoding
→ handle_element() # classify → skip / transparent / block / inline
→ handle_block() # paragraphs, headings, lists, blockquotes, pre, hr
→ handle_inline() # bold, italic, links, images, code, br
→ whitespace::normalize() # collapse blank lines, trim
→ String
```
## Module Responsibilities
| `lib.rs` | Public API (`convert`), re-exports |
| `convert.rs` | DOM walker, `Context` state, element dispatch |
| `elements.rs` | Element classification, tracking pixel / hidden detection |
| `whitespace.rs` | Output normalization |
| `tables.rs` | Table layout detection and unwrapping (Phase 2) |
| `replies.rs` | Reply chain detection and quoting (Phase 3) |
## Design Decisions
**scraper over html5ever directly**: We need tree traversal (parent/child/sibling access) for layout table unwrapping and reply chain detection. scraper provides this via ego-tree on top of html5ever's spec-compliant parsing.
**Markdown output**: Markdown is readable as plain text and renderable by any toolchain. It preserves structural information (headings, links, lists) that plain text loses.
**Faithful conversion**: pter converts what's there. Content extraction (stripping marketing wrappers) and post-processing (trimming signatures) are separate concerns, composable before or after pter.
**Blockquote rendering**: Blockquotes render children into a temporary buffer, then prefix each line with `> `. This handles nested blockquotes naturally — inner quotes produce `> ` lines, outer quote prefixes them again to get `> > `.
## Dependencies
| `scraper` | HTML parsing + DOM tree + CSS selectors |
| `proptest` (dev) | Property-based testing |