H2M
Fast, extensible HTML-to-Markdown converter for Rust — CommonMark + GFM, plugin architecture, zero unsafe.
H2M converts HTML into clean Markdown with full CommonMark compliance and GitHub Flavored Markdown extensions. It uses a plugin-based rule system, supports reference-style links, relative URL resolution, and ships with an async CLI powered by tokio for high-concurrency batch fetching.
Quick Start
Install the CLI
Shell (macOS / Linux):
|
PowerShell (Windows):
irm https://sh.qntx.fun/labs/h2m/ps | iex
Or via Cargo:
CLI Usage
# Convert a URL directly
# Extract only the article content
# Smart readable extraction (strips nav, footer, aside, etc.)
# Short form
# Local file with GFM + referenced links, save to file
# Pipe from stdin
|
# JSON output for programmatic / agent consumption
# Batch convert multiple URLs (NDJSON streaming output)
# Batch from file with concurrency control
# Custom User-Agent
# All formatting options
JSON Output
Single URL produces a pretty-printed JSON object:
Multiple URLs produce NDJSON (one JSON object per line), ideal for streaming pipelines.
Library Usage
// One-liner with CommonMark defaults
let md = convert;
assert_eq!;
// Full control with builder
use ;
use Gfm;
use CommonMark;
let converter = builder
.options
.use_plugin
.use_plugin
.domain
.build;
let md = converter.convert;
assert_eq!;
Async Fetching
Enable the fetch feature for async HTTP fetching with built-in concurrency control, rate limiting, and streaming output:
use Fetcher;
let fetcher = builder
.concurrency
.gfm
.extract_links
.build?;
// Single fetch
let result = fetcher.fetch.await?;
println!;
// Batch with streaming callback
let urls = vec!;
fetcher.fetch_many_streaming.await;
Design
- CommonMark + GFM — full spec compliance with tables, strikethrough, task lists, reference-style links
- Plugin architecture — extend with custom rules via the
Ruletrait - Async batch pipeline —
tokio+reqwest, semaphore concurrency, streaming NDJSON (feature-gated) - JSON output — structured result with rich metadata (status, language, description, og:image) for agent/programmatic consumption
- Smart readable extraction — two-phase content detection: semantic selectors → noise stripping (
nav,footer,aside,header, ARIA roles) - Smart fetching — configurable User-Agent, HTML meta-refresh redirect following
- Zero-copy fast paths —
Cow<str>escaping, zerounsafe,Send + Sync
Conversion Examples
Input HTML:
Title
A bold and italic paragraph with a link.
First item
Second item
fn main() {}
Output Markdown:
A **bold** and *italic* paragraph with [a link](https://example.com).
- -
```rust
fn main() {}
```
Supported HTML Elements
CommonMark (built-in)
| Element | Markdown Output |
|---|---|
<h1>-<h6> |
# Heading (ATX) or underline (Setext) |
<p>, <div>, <section>, <article> |
Block paragraph |
<strong>, <b> |
**bold** |
<em>, <i> |
*italic* |
<code>, <kbd>, <samp>, <tt> |
`inline code` |
<pre><code> |
Fenced code block with language detection |
<a href="..."> |
[text](url) or reference-style |
<img src="..." alt="..."> |
 |
<ul>, <ol>, <li> |
Bullet/numbered lists with nesting |
<blockquote> |
> quoted text |
<hr> |
--- |
<br> |
Hard line break |
<iframe> |
[iframe](url) |
GFM Extensions (with --gfm)
| Element | Markdown Output |
|---|---|
<table> |
GFM pipe table with alignment |
<del>, <s>, <strike> |
~~strikethrough~~ |
<input type="checkbox"> |
[x] or [ ] (task list) |
Auto-removed
| Element | Behavior |
|---|---|
<script> |
Removed (content stripped) |
<style> |
Removed (content stripped) |
<noscript> |
Removed (content stripped) |
Custom Rules
Extend the converter with your own rules by implementing the Rule trait:
use ;
use CommonMark;
use ElementRef;
;
let mut builder = builder
.use_plugin;
builder.add_rule;
let converter = builder.build;
let md = converter.convert;
assert!;
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project shall be dual-licensed as above, without any additional terms or conditions.
A QNTX open-source project.
Code is law. We write both.