quick_html2md
Fast HTML to Markdown conversion with GitHub Flavored Markdown (GFM) support.
Features
- Headings:
<h1>-<h6>->#-###### - Emphasis:
<strong>/<b>->**bold**,<em>/<i>->*italic* - Strikethrough:
<del>/<s>->~~struck~~(GFM) - Lists:
<ul>/<ol>with proper nesting and indentation - Links:
<a href="">->[text](url) - Images:
<img>-> - Code:
<code>->`inline`,<pre><code>-> fenced blocks with language - Tables: Full GFM table support with alignment
- Blockquotes:
<blockquote>->> quote - URL Resolution: Resolve relative URLs against a base URL
- CommonMark Mode: Strict CommonMark compliance option
- Character Escaping: Optional escaping of markdown special characters
Quick Start
use html_to_markdown;
let html = "<h1>Hello</h1><p>World</p>";
let md = html_to_markdown;
assert_eq!;
With Options
use ;
let options = new
.include_links // Strip links, keep text
.preserve_tables;
let md = html_to_markdown_with_options;
URL Resolution
Resolve relative URLs in links and images against a base URL:
use ;
let options = new
.base_url;
let html = r#"<a href="page.html">Link</a>"#;
let md = html_to_markdown_with_options;
// Output: [Link](https://example.com/docs/page.html)
CommonMark Mode
For strict CommonMark compliance (disables GFM extensions):
use ;
let options = commonmark;
let html = "<ul><li>parent<ul><li>child</li></ul></li></ul>";
let md = html_to_markdown_with_options;
// Uses 4-space indentation, escapes special chars, no strikethrough/tables
Nested Lists
This crate properly handles nested lists, producing clean markdown output:
let html = "<ul><li>parent<ul><li>child</li></ul></li></ul>";
let md = html_to_markdown;
// Output:
// - parent
// - child
GFM Tables
HTML tables are converted to GitHub Flavored Markdown tables with alignment support:
let html = r#"<table>
<tr><th align="left">Name</th><th align="right">Value</th></tr>
<tr><td>foo</td><td>42</td></tr>
</table>"#;
let md = html_to_markdown;
// Output:
// | Name | Value |
// |:--- | ---:|
// | foo | 42 |
Code Block Language Detection
The converter detects programming languages from common class naming patterns:
language-rust(Prism.js, Highlight.js)lang-pythonhighlight-javascriptsourceCode rust(Pandoc)- Direct class names:
rust,python,javascript, etc.
Image Dimension Handling
In CommonMark mode, images with width/height attributes are output as HTML to preserve dimensions:
let options = commonmark;
let html = r#"<img src="photo.jpg" alt="Photo" width="200" height="100">"#;
let md = html_to_markdown_with_options;
// Output: <img src="photo.jpg" alt="Photo" width="200" height="100" />
Optional Features
url- Enable theurlcrate for more robust URL resolution
[]
= { = "0.1", = ["url"] }
Migration from html-cleaning
If you were using html_cleaning::markdown:
// Before
use html_to_markdown;
// After
use html_to_markdown;
The API is identical - just change the import.
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.