# html-to-markdown
<img width="1128" height="191" alt="Linkedin- Banner (1)" src="https://github.com/user-attachments/assets/f8e91036-20a5-40f9-9fcc-9e6c6e15f1f5" />
High-performance HTML โ Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.
Part of the Kreuzberg.dev document intelligence ecosystem. Kreuzberg is a polyglot document intelligence framework with a fast Rust core. We build tools that help developers extract, process, and understand documents at scale, from PDFs to Office files, images, archives, emails, in 50+ formats. We've set out to make high-performance document intelligence faster and more ecological.
[](https://crates.io/crates/html-to-markdown-rs)
[](https://www.npmjs.com/package/html-to-markdown-node)
[](https://www.npmjs.com/package/html-to-markdown-wasm)
[](https://pypi.org/project/html-to-markdown/)
[](https://packagist.org/packages/goldziher/html-to-markdown)
[](https://rubygems.org/gems/html-to-markdown)
[](https://hex.pm/packages/html_to_markdown)
[](https://www.nuget.org/packages/Goldziher.HtmlToMarkdown/)
[](https://central.sonatype.com/artifact/io.github.goldziher/html-to-markdown)
[](https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v2/htmltomarkdown)
[](https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE)
[](https://discord.gg/pXxagNK2zN)
---
## ๐ฎ **[Try the Live Demo โ](https://kreuzberg-dev.github.io/html-to-markdown/)**
Experience WebAssembly-powered HTML to Markdown conversion instantly in your browser. No installation needed!
---
## Why html-to-markdown?
- **Blazing Fast**: Rust-powered core delivers 10-80ร faster conversion than pure Python alternatives
- **Universal**: Works everywhere - Node.js, Bun, Deno, browsers, Python, Rust, and standalone CLI
- **Smart Conversion**: Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
- **Metadata Extraction**: Extract document metadata (title, description, headers, links, images) alongside conversion
- **Highly Configurable**: Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
- **Tag Preservation**: Keep specific HTML tags unconverted when markdown isn't expressive enough
- **Secure by Default**: Built-in HTML sanitization prevents malicious content
- **Consistent Output**: Identical markdown rendering across all language bindings
## Quick Start
**Node.js / Bun (Native - Fastest):**
```typescript
import { convert } from 'html-to-markdown-node';
const html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>';
const markdown = convert(html, {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
wrap: true,
preserveTags: ['table'],
});
```
**Python:**
```python
from html_to_markdown import convert
html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>'
markdown = convert(html, heading_style='Atx', wrap=True)
```
**Ruby:**
```ruby
require 'html_to_markdown'
html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>'
markdown = HtmlToMarkdown.convert(html, heading_style: :atx, wrap: true)
```
Full language guides: See [Language Guides](#language-guides) below.
## Installation
| **Node.js/Bun** (native) | `npm install html-to-markdown-node` |
| **WebAssembly** (universal) | `npm install html-to-markdown-wasm` |
| **Deno** | `import { convert } from "npm:html-to-markdown-wasm"` |
| **Python** (bindings + CLI) | `pip install html-to-markdown` |
| **PHP** (extension + helpers) | `PHP_EXTENSION_DIR=$(php-config --extension-dir) pie install goldziher/html-to-markdown`<br>`composer require goldziher/html-to-markdown` |
| **Ruby** gem | `bundle add html-to-markdown` or `gem install html-to-markdown` |
| **Elixir** (Rustler NIF) | `{:html_to_markdown, "~> 2.8"}` |
| **Rust** crate | `cargo add html-to-markdown-rs` |
| Rust CLI (crates.io) | `cargo install html-to-markdown-cli` |
| Homebrew CLI | `brew install html-to-markdown` (core) |
| Releases | [GitHub Releases](https://github.com/kreuzberg-dev/html-to-markdown/releases) |
## Performance
Benchmarked on Apple M4 using the shared fixture harness in `tools/benchmark-harness`.
### Comparative Throughput (Median Across Fixtures)
| Rust | 1,060.3 | 116.4 | 171.3 | 56/56 |
| Go | 1,496.3 | 131.1 | 22.9 | 16/16 |
| Ruby | 2,155.5 | 300.4 | 280.3 | 48/48 |
| PHP | 2,357.7 | 308.0 | 223.5 | 48/48 |
| Elixir | 1,564.1 | 269.1 | 384.7 | 48/48 |
| C# | 1,234.2 | 272.4 | 187.8 | 16/16 |
| Java | 1,298.7 | 167.1 | 527.2 | 16/16 |
| WASM | 1,485.8 | 157.6 | 95.3 | 48/48 |
| Node.js (NAPI) | 2,054.2 | 306.5 | 95.4 | 48/48 |
| Python (PyO3) | 3,120.3 | 307.5 | 83.5 | 48/48 |
Use `task bench:harness` to regenerate throughput numbers. See [Performance Guide](./examples/performance/) for benchmarking strategies and optimization tips.
## Language Guides
Complete documentation with examples for each language:
- **Ruby** โ [README](./packages/ruby/README.md) | Magnus bindings, RBS type definitions, Steep checking
- **PHP** โ [Package](./packages/php/README.md) | [Extension (PIE)](./packages/php-ext/README.md) | ext-php-rs extension
- **Go** โ [README](./packages/go/README.md) | FFI bindings with cgo
- **Java** โ [README](./packages/java/README.md) | Panama FFI, Maven/Gradle setup
- **C#/.NET** โ [README](./packages/csharp/README.md) | P/Invoke FFI, NuGet distribution
- **Elixir** โ [README](./packages/elixir/README.md) | Rustler NIF bindings
- **Rust** โ [README](./crates/html-to-markdown/README.md) | Core library, error handling, advanced features
## Feature Guides
### Visitor Pattern
Customize HTMLโMarkdown conversion with callbacks for specific elements. Use cases: domain-specific dialects, content filtering, URL rewriting, accessibility validation.
**โ [Full Guide with Examples](./examples/visitor-pattern/)** (Python, TypeScript, Ruby)
### Metadata Extraction
Extract comprehensive metadata during conversion: title, description, headers, links, images, structured data. Use cases: SEO extraction, TOC generation, link validation, accessibility auditing, content migration.
**โ [Full Guide with Examples](./examples/metadata-extraction/)** (Python, TypeScript, Ruby)
### Performance & Benchmarking
Understand performance characteristics, run benchmarks, optimize for your use case. Includes benchmarking tools, memory profiling, streaming strategies, and optimization tips.
**โ [Full Guide](./examples/performance/)**
## Examples
Explore working code examples in multiple languages:
| **Visitor Pattern** | [examples/visitor-pattern/](./examples/visitor-pattern/) | Python, TypeScript, Ruby |
| **Metadata Extraction** | [examples/metadata-extraction/](./examples/metadata-extraction/) | Python, TypeScript, Ruby |
| **Performance** | [examples/performance/](./examples/performance/) | Benchmarks, profiling, optimization |
## Testing
Run the test suite locally:
```bash
# All core test suites (Rust, Python, Ruby, Node, PHP, Go, C#, Elixir, Java)
task test
# Run the Wasmtime-backed WASM integration tests
task wasm:test:wasmtime
```
## Compatibility (v1 โ v2)
- V2's Rust core sustains **150โ210 MB/s** throughput; V1 averaged **โ 2.5 MB/s** (60โ80ร faster).
- Python compatibility shim available in `html_to_markdown.v1_compat` (deprecated; emits warnings; plan migrations now). See [Python README](./packages/python/README.md#v1-compatibility) for keyword mappings.
- CLI flag changes and other breaking updates in [CHANGELOG](./CHANGELOG.md#breaking-changes).
## Community
- **Discord** โ [Join our community](https://discord.gg/pXxagNK2zN)
- **Ecosystem** โ Explore [Kreuzberg](https://kreuzberg.dev) document-processing tools
- **Contribute** โ [CONTRIBUTING.md](./CONTRIBUTING.md)
- **Sponsor** โ [GitHub Sponsors](https://github.com/sponsors/kreuzberg-dev)
- **Changelog** โ [Version history](./CHANGELOG.md)
## License
MIT License โ see [LICENSE](./LICENSE) for details.