html2markdown
HTML to Markdown converter using AST-to-AST transformation.
Ports the architecture and test cases from hast-util-to-mdast (transformer) and mdast-util-to-markdown (serializer).
Usage
Add to your Cargo.toml:
[]
= "0.2"
let md = convert;
assert_eq!;
With options
use ;
let opts = new.with_heading_style;
let md = convert_with;
assert_eq!;
What it handles
- Headings, paragraphs, blockquotes, lists (ordered, unordered, task lists)
- Inline formatting: bold, italic, strikethrough, code
- Links, images, and reference-style links
- Tables (with alignment)
- Code blocks (fenced) with language hints
- Horizontal rules, line breaks
- Nested structures and edge cases from 130 fixture tests
- Context-sensitive escaping to prevent false Markdown syntax
Architecture
The conversion is a two-phase pipeline:
-
HTML tree -> MDAST — html5ever parses the HTML into a DOM, then element handlers transform each node into typed Markdown AST nodes. Whitespace is normalized during this phase.
-
MDAST -> Markdown string — the AST is serialized with configurable formatting (heading style, bullet character, list indent, emphasis marker) and context-sensitive escaping.
The two phases are independent: the transformer knows nothing about string formatting, and the serializer knows nothing about HTML.
Optional features
| Feature | Description |
|---|---|
tracing |
Enable debug/trace logging (zero-cost when disabled) |
= { = "0.2", = ["tracing"] }
Benchmarks
Throughput comparison (MiB/s, higher is better):
| Fixture | Rust | html2md (Rust) | Go | hast (JS) | turndown (JS) |
|---|---|---|---|---|---|
| article | 68.5 | 58.3 | 29.3 | 4.1 | 15.8 |
| table | 21.1 | 17.9 | 21.7 | 1.8 | ERR |
| lists | 19.8 | 21.3 | 18.0 | 1.7 | 4.6 |
| code | 62.4 | 55.7 | 43.2 | 5.2 | 15.2 |
| large | 48.7 | 43.3 | 28.8 | 3.1 | ERR |
Measured on Apple M4 Max, Rust 1.93, Go 1.25, Node 22, macOS 15.7.
Reproduce:
License
MIT