Module converter

Module converter 

Source
Expand description

HTML to Markdown conversion using the astral-tl parser.

This module provides the core conversion logic for transforming HTML documents into Markdown. It uses the astral-tl parser for high-performance HTML parsing and supports 60+ HTML tags.

§Architecture

The conversion process follows these steps:

  1. Parse HTML into a DOM tree using the astral-tl parser
  2. Walk the DOM tree recursively
  3. Convert each node type to its Markdown equivalent
  4. Apply text escaping and whitespace normalization

§Whitespace Handling

This library preserves whitespace exactly as it appears in the HTML source. Text nodes retain their original spacing, including multiple spaces and newlines.

  • Raw text preservation: All whitespace in text nodes is preserved
  • No HTML5 normalization: Whitespace is not collapsed according to HTML5 rules
  • Full control: Applications can handle whitespace as needed

§Supported Features

  • Block elements: headings, paragraphs, lists, tables, blockquotes
  • Inline formatting: bold, italic, code, links, images, strikethrough
  • Semantic HTML5: article, section, nav, aside, header, footer
  • Forms: inputs, select, button, textarea, fieldset
  • Media: audio, video, picture, iframe, svg
  • Advanced: task lists, ruby annotations, definition lists

§Examples

use html_to_markdown_rs::{convert, ConversionOptions};

let html = "<h1>Title</h1><p>Paragraph with <strong>bold</strong> text.</p>";
let markdown = convert(html, None).unwrap();
assert_eq!(markdown, "# Title\n\nParagraph with **bold** text.\n");

Functions§

convert_html
Convert HTML to Markdown using tl DOM parser.