html-to-markdown
High-performance HTML โ Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.
Part of the Kreuzberg.dev document intelligence ecosystem. Kreuzberg is a polyglot document intelligence framework with a fast Rust core. We build tools that help developers extract, process, and understand documents at scale, from PDFs to Office files, images, archives, emails, in 50+ formats. We've set out to make high-performance document intelligence faster and more ecological.
๐ฎ Try the Live Demo โ
Experience WebAssembly-powered HTML to Markdown conversion instantly in your browser. No installation needed!
Why html-to-markdown?
- Blazing Fast: Rust-powered core delivers 10-80ร faster conversion than pure Python alternatives
- Universal: Works everywhere - Node.js, Bun, Deno, browsers, Python, Rust, and standalone CLI
- Smart Conversion: Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
- Metadata Extraction: Extract document metadata (title, description, headers, links, images) alongside conversion
- Highly Configurable: Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
- Tag Preservation: Keep specific HTML tags unconverted when markdown isn't expressive enough
- Secure by Default: Built-in HTML sanitization prevents malicious content
- Consistent Output: Identical markdown rendering across all language bindings
Quick Start
Node.js / Bun (Native - Fastest):
import { convert } from 'html-to-markdown-node';
const html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>';
const markdown = convert(html, {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
wrap: true,
preserveTags: ['table'],
});
Python:
=
=
Ruby:
html =
markdown = HtmlToMarkdown.convert(html, heading_style: :atx, wrap: true)
Full language guides: See Language Guides below.
Installation
| Target | Command(s) |
|---|---|
| Node.js/Bun (native) | npm install html-to-markdown-node |
| WebAssembly (universal) | npm install html-to-markdown-wasm |
| Deno | import { convert } from "npm:html-to-markdown-wasm" |
| Python (bindings + CLI) | pip install html-to-markdown |
| PHP (extension + helpers) | PHP_EXTENSION_DIR=$(php-config --extension-dir) pie install goldziher/html-to-markdowncomposer require goldziher/html-to-markdown |
| Ruby gem | bundle add html-to-markdown or gem install html-to-markdown |
| Elixir (Rustler NIF) | {:html_to_markdown, "~> 2.8"} |
| Rust crate | cargo add html-to-markdown-rs |
| Rust CLI (crates.io) | cargo install html-to-markdown-cli |
| Homebrew CLI | brew install html-to-markdown (core) |
| Releases | GitHub Releases |
Performance
Benchmarked on Apple M4 using the shared fixture harness in tools/benchmark-harness.
Comparative Throughput (Median Across Fixtures)
| Runtime | Median ops/sec | Median throughput (MB/s) | Peak memory (MB) | Successes |
|---|---|---|---|---|
| Rust | 1,060.3 | 116.4 | 171.3 | 56/56 |
| Go | 1,496.3 | 131.1 | 22.9 | 16/16 |
| Ruby | 2,155.5 | 300.4 | 280.3 | 48/48 |
| PHP | 2,357.7 | 308.0 | 223.5 | 48/48 |
| Elixir | 1,564.1 | 269.1 | 384.7 | 48/48 |
| C# | 1,234.2 | 272.4 | 187.8 | 16/16 |
| Java | 1,298.7 | 167.1 | 527.2 | 16/16 |
| WASM | 1,485.8 | 157.6 | 95.3 | 48/48 |
| Node.js (NAPI) | 2,054.2 | 306.5 | 95.4 | 48/48 |
| Python (PyO3) | 3,120.3 | 307.5 | 83.5 | 48/48 |
Use task bench:harness to regenerate throughput numbers. See Performance Guide for benchmarking strategies and optimization tips.
Language Guides
Complete documentation with examples for each language:
- Python โ README | PyO3 bindings, metadata extraction, inline images
- JavaScript/TypeScript โ Node.js | TypeScript | WASM
- Ruby โ README | Magnus bindings, RBS type definitions, Steep checking
- PHP โ Package | Extension (PIE) | ext-php-rs extension
- Go โ README | FFI bindings with cgo
- Java โ README | Panama FFI, Maven/Gradle setup
- C#/.NET โ README | P/Invoke FFI, NuGet distribution
- Elixir โ README | Rustler NIF bindings
- Rust โ README | Core library, error handling, advanced features
Feature Guides
Visitor Pattern
Customize HTMLโMarkdown conversion with callbacks for specific elements. Use cases: domain-specific dialects, content filtering, URL rewriting, accessibility validation.
โ Full Guide with Examples (Python, TypeScript, Ruby)
Metadata Extraction
Extract comprehensive metadata during conversion: title, description, headers, links, images, structured data. Use cases: SEO extraction, TOC generation, link validation, accessibility auditing, content migration.
โ Full Guide with Examples (Python, TypeScript, Ruby)
Performance & Benchmarking
Understand performance characteristics, run benchmarks, optimize for your use case. Includes benchmarking tools, memory profiling, streaming strategies, and optimization tips.
โ Full Guide
Examples
Explore working code examples in multiple languages:
| Example | Path | Languages |
|---|---|---|
| Visitor Pattern | examples/visitor-pattern/ | Python, TypeScript, Ruby |
| Metadata Extraction | examples/metadata-extraction/ | Python, TypeScript, Ruby |
| Performance | examples/performance/ | Benchmarks, profiling, optimization |
Testing
Run the test suite locally:
# All core test suites (Rust, Python, Ruby, Node, PHP, Go, C#, Elixir, Java)
# Run the Wasmtime-backed WASM integration tests
Compatibility (v1 โ v2)
- V2's Rust core sustains 150โ210 MB/s throughput; V1 averaged โ 2.5 MB/s (60โ80ร faster).
- Python compatibility shim available in
html_to_markdown.v1_compat(deprecated; emits warnings; plan migrations now). See Python README for keyword mappings. - CLI flag changes and other breaking updates in CHANGELOG.
Community
- Discord โ Join our community
- Ecosystem โ Explore Kreuzberg document-processing tools
- Contribute โ CONTRIBUTING.md
- Sponsor โ GitHub Sponsors
- Changelog โ Version history
License
MIT License โ see LICENSE for details.