H2M
Fast, extensible HTML-to-Markdown converter for Rust — CommonMark + GFM, plugin architecture, zero unsafe.
H2M converts HTML into clean Markdown with full CommonMark compliance and GitHub Flavored Markdown extensions. It uses a plugin-based rule system, supports reference-style links, relative URL resolution, and ships with an async CLI powered by tokio for high-concurrency batch fetching.
Quick Start
Install the CLI
Shell (macOS / Linux):
|
PowerShell (Windows):
irm https://sh.qntx.fun/labs/h2m/ps | iex
Or via Cargo:
CLI Usage
# Convert a URL directly
# Extract only the article content
# Local file with GFM + referenced links, save to file
# Pipe from stdin
|
# JSON output for programmatic / agent consumption
# Batch convert multiple URLs (NDJSON streaming output)
# Batch from file with concurrency control
# All formatting options
JSON Output
Single URL produces a pretty-printed JSON object:
Multiple URLs produce NDJSON (one JSON object per line), ideal for streaming pipelines.
Library Usage
// One-liner with CommonMark defaults
let md = convert;
assert_eq!;
// Full control with builder
use ;
use Gfm;
use CommonMark;
let converter = builder
.options
.use_plugin
.use_plugin
.domain
.build;
let md = converter.convert;
assert_eq!;
Async Fetching (feature = "fetch")
Enable the fetch feature for async HTTP fetching with built-in concurrency control, rate limiting, and streaming output:
use Fetcher;
let fetcher = builder
.concurrency
.gfm
.extract_links
.build?;
// Single fetch
let result = fetcher.fetch.await?;
println!;
// Batch with streaming callback
let urls = vec!;
fetcher.fetch_many_streaming.await;
Design
- CommonMark compliant — headings, paragraphs, emphasis, strong, code blocks, links, images, lists, blockquotes, horizontal rules, line breaks
- GFM extensions — tables (with column alignment), strikethrough, task lists
- Reference-style links — full (
[text][1]), collapsed ([text][]), and shortcut ([text]) styles - Domain resolution — resolve relative URLs to absolute via the
urlcrate (WHATWG compliant) - Plugin architecture — extend with custom rules via the
Ruletrait; register withConverter::builder().use_plugin() - Async HTTP pipeline —
tokio+reqwestwith semaphore-based concurrency, rate limiting, and streaming NDJSON output (feature-gated) - JSON / NDJSON output — structured output for agent/programmatic consumption; single result → JSON, batch → NDJSON
- HTML utilities —
html::extract_title(),html::extract_links(),html::select()for metadata extraction without full conversion - Keep / Remove — selectively preserve raw HTML tags or strip them entirely
- CSS selector extraction —
--selectorflag to convert only matching elements - Zero-copy fast paths —
Cow<str>for escaping and whitespace normalization; no allocation when input needs no transformation Send + Sync—Converteris immutable after build, safe to share across threads (compile-time assertion)- Strict linting — Clippy
pedantic+nursery+correctness(deny), zero warnings
Conversion Examples
Input HTML:
Title
A bold and italic paragraph with a link.
First item
Second item
fn main() {}
Output Markdown:
A **bold** and *italic* paragraph with [a link](https://example.com).
- -
```rust
fn main() {}
```
Custom Rules
Extend the converter with your own rules by implementing the Rule trait:
use ;
use CommonMark;
use ElementRef;
;
let converter = builder
.use_plugin
.build;
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project shall be dual-licensed as above, without any additional terms or conditions.
A QNTX open-source project.
Code is law. We write both.