Expand description
HTML to Markdown conversion using the astral-tl parser.
This module provides the core conversion logic for transforming HTML documents into Markdown. It uses the astral-tl parser for high-performance HTML parsing and supports 60+ HTML tags.
§Architecture
The conversion process follows these steps:
- Parse HTML into a DOM tree using the astral-tl parser
- Walk the DOM tree recursively
- Convert each node type to its Markdown equivalent
- Apply text escaping and whitespace normalization
§Whitespace Handling
This library preserves whitespace exactly as it appears in the HTML source. Text nodes retain their original spacing, including multiple spaces and newlines.
- Raw text preservation: All whitespace in text nodes is preserved
- No HTML5 normalization: Whitespace is not collapsed according to HTML5 rules
- Full control: Applications can handle whitespace as needed
§Supported Features
- Block elements: headings, paragraphs, lists, tables, blockquotes
- Inline formatting: bold, italic, code, links, images, strikethrough
- Semantic HTML5: article, section, nav, aside, header, footer
- Forms: inputs, select, button, textarea, fieldset
- Media: audio, video, picture, iframe, svg
- Advanced: task lists, ruby annotations, definition lists
§Examples
use html_to_markdown_rs::{convert, ConversionOptions};
let html = "<h1>Title</h1><p>Paragraph with <strong>bold</strong> text.</p>";
let markdown = convert(html, None).unwrap();
assert_eq!(markdown, "# Title\n\nParagraph with **bold** text.\n");Functions§
- convert_
html - Convert HTML to Markdown using tl DOM parser.