# quick_html2md
Fast HTML to Markdown conversion with GitHub Flavored Markdown (GFM) support.
## Features
- **Headings**: `<h1>`-`<h6>` -> `#`-`######`
- **Emphasis**: `<strong>`/`<b>` -> `**bold**`, `<em>`/`<i>` -> `*italic*`
- **Strikethrough**: `<del>`/`<s>` -> `~~struck~~` (GFM)
- **Lists**: `<ul>`/`<ol>` with proper nesting and indentation
- **Links**: `<a href="">` -> `[text](url)`
- **Images**: `<img>` -> ``
- **Code**: `<code>` -> `` `inline` ``, `<pre><code>` -> fenced blocks with language
- **Tables**: Full GFM table support with alignment
- **Blockquotes**: `<blockquote>` -> `> quote`
- **URL Resolution**: Resolve relative URLs against a base URL
- **CommonMark Mode**: Strict CommonMark compliance option
- **Smart Escaping**: Position-aware escaping of markdown special characters
## Quick Start
```rust
use quick_html2md::html_to_markdown;
let html = "<h1>Hello</h1><p>World</p>";
let md = html_to_markdown(html);
assert_eq!(md, "# Hello\n\nWorld\n");
```
## With Options
```rust
use quick_html2md::{html_to_markdown_with_options, MarkdownOptions};
let options = MarkdownOptions::new()
.include_links(false) // Strip links, keep text
.preserve_tables(true);
let md = html_to_markdown_with_options(html, &options);
```
## URL Resolution
Resolve relative URLs in links and images against a base URL:
```rust
use quick_html2md::{html_to_markdown_with_options, MarkdownOptions};
let options = MarkdownOptions::new()
.base_url("https://example.com/docs/");
let html = r#"<a href="page.html">Link</a>"#;
let md = html_to_markdown_with_options(html, &options);
// Output: [Link](https://example.com/docs/page.html)
```
## CommonMark Mode
For strict CommonMark compliance (disables GFM extensions):
```rust
use quick_html2md::{html_to_markdown_with_options, MarkdownOptions};
let options = MarkdownOptions::commonmark();
let html = "<ul><li>parent<ul><li>child</li></ul></li></ul>";
let md = html_to_markdown_with_options(html, &options);
// Uses 4-space indentation, escapes special chars, no strikethrough/tables
```
## Nested Lists
This crate properly handles nested lists, producing clean markdown output:
```rust
let html = "<ul><li>parent<ul><li>child</li></ul></li></ul>";
let md = html_to_markdown(html);
// Output:
// - parent
// - child
```
## GFM Tables
HTML tables are converted to GitHub Flavored Markdown tables with alignment support:
```rust
let html = r#"<table>
<tr><th align="left">Name</th><th align="right">Value</th></tr>
<tr><td>foo</td><td>42</td></tr>
</table>"#;
let md = html_to_markdown(html);
// Output:
// | Name | Value |
// |:--- | ---:|
// | foo | 42 |
```
## Code Block Language Detection
The converter detects programming languages from common class naming patterns:
- `language-rust` (Prism.js, Highlight.js)
- `lang-python`
- `highlight-javascript`
- `sourceCode rust` (Pandoc)
- Direct class names: `rust`, `python`, `javascript`, etc.
## Image Dimension Handling
In CommonMark mode, images with width/height attributes are output as HTML to preserve dimensions:
```rust
let options = MarkdownOptions::commonmark();
let html = r#"<img src="photo.jpg" alt="Photo" width="200" height="100">"#;
let md = html_to_markdown_with_options(html, &options);
// Output: <img src="photo.jpg" alt="Photo" width="200" height="100" />
```
## Smart Character Escaping
When `escape_special_chars(true)` is enabled, the converter uses position-aware
escaping that only escapes characters where they would create markdown constructs:
- Core characters (`\`, `` ` ``, `*`, `_`, `[`, `]`, `<`) are always escaped
- Positional characters (`#`, `>`, `-`, `+`, `.`, `!`) are only escaped where
they could create headings, blockquotes, lists, or image syntax
- Characters like `{`, `}`, `(`, `)`, `~` are **not** escaped since they don't
create markdown constructs in standard/GFM markdown
```rust
use quick_html2md::{html_to_markdown_with_options, MarkdownOptions};
let options = MarkdownOptions::new().escape_special_chars(true);
let html = "<p>Price is $10.99 and fn() { return x; }</p>";
let md = html_to_markdown_with_options(html, &options);
// Braces, parens, and periods are NOT escaped
// Output: Price is $10.99 and fn() { return x; }
```
## Efficient Structural HTML Handling
Empty or whitespace-only structural elements (`<div>`, `<section>`, `<nav>`, etc.)
are collapsed rather than producing inflated output. This prevents the 3-40x size
inflation that can occur on complex pages with deep `<div>` nesting.
## Optional Features
- `url` - Enable the `url` crate for more robust URL resolution
```toml
[dependencies]
quick_html2md = { version = "0.2", features = ["url"] }
```
## Migration from html-cleaning
If you were using `html_cleaning::markdown`:
```rust
// Before
use html_cleaning::markdown::html_to_markdown;
// After
use quick_html2md::html_to_markdown;
```
The API is identical - just change the import.
## Changelog
### v0.2.1
- Fixed period escaping inside headings (e.g., `### 1. Section` no longer becomes `### 1\. Section`)
### v0.2.0
- Position-aware smart escaping (only escapes where markdown constructs would be created)
- Structural HTML handling (collapses empty `<div>` nesting)
- Whitespace-only text node filtering
## License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.