# html-to-markdown
High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.
[](https://crates.io/crates/html-to-markdown-rs)
[](https://www.npmjs.com/package/html-to-markdown-node)
[](https://www.npmjs.com/package/html-to-markdown-wasm)
[](https://www.npmjs.com/package/html-to-markdown)
[](https://pypi.org/project/html-to-markdown/)
[](https://packagist.org/packages/goldziher/html-to-markdown)
[](https://rubygems.org/gems/html-to-markdown)
[](https://github.com/Goldziher/html-to-markdown/blob/main/LICENSE)
[](https://discord.gg/pXxagNK2zN)
---
## 🎮 **[Try the Live Demo →](https://goldziher.github.io/html-to-markdown/)**
Experience WebAssembly-powered HTML to Markdown conversion instantly in your browser. No installation needed!
---
## Why html-to-markdown?
- **Blazing Fast**: Rust-powered core delivers 10-80× faster conversion than pure Python alternatives
- **Universal**: Works everywhere - Node.js, Bun, Deno, browsers, Python, Rust, and standalone CLI
- **Smart Conversion**: Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
- **Highly Configurable**: Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
- **Tag Preservation**: Keep specific HTML tags unconverted when markdown isn't expressive enough
- **Secure by Default**: Built-in HTML sanitization prevents malicious content
- **Consistent Output**: Identical markdown rendering across all language bindings
## Documentation
- **JavaScript/TypeScript guides**:
- Node.js/Bun (native) – [Node.js README](https://github.com/Goldziher/html-to-markdown/blob/main/crates/html-to-markdown-node/README.md)
- WebAssembly (universal) – [WASM README](https://github.com/Goldziher/html-to-markdown/blob/main/crates/html-to-markdown-wasm/README.md)
- TypeScript wrapper – [TypeScript README](https://github.com/Goldziher/html-to-markdown/blob/main/packages/typescript/README.md)
- **Python guide** – [Python README](https://github.com/Goldziher/html-to-markdown/blob/main/packages/python/README.md)
- **PHP guides**:
- PHP wrapper package – [PHP README](https://github.com/Goldziher/html-to-markdown/blob/main/packages/php/README.md)
- PHP extension (PIE) – [Extension README](https://github.com/Goldziher/html-to-markdown/blob/main/packages/php-ext/README.md)
- **Ruby guide** – [Ruby README](https://github.com/Goldziher/html-to-markdown/blob/main/crates/html-to-markdown-rb/README.md)
- **Rust guide** – [Rust README](https://github.com/Goldziher/html-to-markdown/blob/main/crates/html-to-markdown/README.md)
- **Contributing** – [CONTRIBUTING.md](https://github.com/Goldziher/html-to-markdown/blob/main/CONTRIBUTING.md) ⭐ Start here!
- **Changelog** – [CHANGELOG.md](https://github.com/Goldziher/html-to-markdown/blob/main/CHANGELOG.md)
## Installation
| **Node.js/Bun** (native) | `npm install html-to-markdown-node` |
| **WebAssembly** (universal) | `npm install html-to-markdown-wasm` |
| **Deno** | `import { convert } from "npm:html-to-markdown-wasm"` |
| **Python** (bindings + CLI) | `pip install html-to-markdown` |
| **PHP** (extension + helpers) | `pie install goldziher/html-to-markdown`<br>`composer require html-to-markdown/extension` |
| **Ruby** gem | `bundle add html-to-markdown` or `gem install html-to-markdown` |
| **Rust** crate | `cargo add html-to-markdown-rs` |
| Rust CLI | `cargo install html-to-markdown-cli` |
| Homebrew CLI | `brew tap goldziher/tap`<br>`brew install html-to-markdown` |
| Releases | [GitHub Releases](https://github.com/Goldziher/html-to-markdown/releases) |
## Quick Start
### JavaScript/TypeScript
**Node.js / Bun (Native - Fastest):**
```typescript
import { convert } from 'html-to-markdown-node';
const html = '<h1>Hello</h1><p>Rust ❤️ Markdown</p>';
const markdown = convert(html, {
headingStyle: 'Atx',
codeBlockStyle: 'Backticks',
wrap: true,
preserveTags: ['table'], // NEW in v2.5: Keep complex HTML as-is
});
```
**Deno / Browsers / Edge (Universal):**
```typescript
import { convert } from "npm:html-to-markdown-wasm"; // Deno
// or: import { convert } from 'html-to-markdown-wasm'; // Bundlers
const markdown = convert(html, {
headingStyle: 'atx',
listIndentWidth: 2,
});
```
**Performance:** Native bindings average ~19k ops/sec, WASM averages ~16k ops/sec (benchmarked on complex real-world documents).
See the JavaScript guides for full API documentation:
- [Node.js/Bun guide](https://github.com/Goldziher/html-to-markdown/tree/main/crates/html-to-markdown-node)
- [WebAssembly guide](https://github.com/Goldziher/html-to-markdown/tree/main/crates/html-to-markdown-wasm)
### CLI
```bash
# Convert a file
html-to-markdown input.html > output.md
# Stream from stdin
# Apply options
html-to-markdown --heading-style atx --list-indent-width 2 input.html
```
### Python (v2 API)
```python
from html_to_markdown import convert, convert_with_inline_images, InlineImageConfig
html = "<h1>Hello</h1><p>Rust ❤️ Markdown</p>"
markdown = convert(html)
markdown, inline_images, warnings = convert_with_inline_images(
'<img src="data:image/png;base64,...==" alt="Pixel">',
image_config=InlineImageConfig(max_decoded_size_bytes=1024, infer_dimensions=True),
)
```
### Rust
```rust
use html_to_markdown_rs::{convert, ConversionOptions, HeadingStyle};
let html = "<h1>Welcome</h1><p>Fast conversion</p>";
let markdown = convert(html, None)?;
let options = ConversionOptions {
heading_style: HeadingStyle::Atx,
..Default::default()
};
let markdown = convert(html, Some(options))?;
```
See the language-specific READMEs for complete configuration, hOCR workflows, and inline image extraction.
## Performance
Benchmarked on Apple M4 with complex real-world documents (Wikipedia articles, tables, lists):
### Operations per Second (higher is better)
| **Small (5 paragraphs)** | 86,233 | 70,300 | 8,443 | **10.2×** |
| **Medium (25 paragraphs)** | 18,979 | 15,282 | 1,846 | **10.3×** |
| **Large (100 paragraphs)** | 4,907 | 3,836 | 438 | **11.2×** |
| **Tables (complex)** | 5,003 | 3,748 | 4,829 | 1.0× |
| **Lists (nested)** | 1,819 | 1,391 | 1,165 | **1.6×** |
| **Wikipedia (129KB)** | 1,125 | 1,022 | - | - |
| **Wikipedia (653KB)** | 156 | 147 | - | - |
### Average Performance Summary
| **Node.js (NAPI-RS)** | **18,162** | 1.17× faster | **7.4× faster** | Maximum throughput in Node.js/Bun |
| **WebAssembly** | **15,536** | baseline | **6.3× faster** | Universal (Deno, browsers, edge) |
| **Python (PyO3)** | **2,465** | 6.3× slower | baseline | Python ecosystem integration |
| **Rust CLI/Binary** | **150-210 MB/s** | - | - | Standalone processing |
### Key Insights
- **JavaScript bindings are fastest**: Native Node.js bindings achieve ~18k ops/sec average, with WASM close behind at ~16k ops/sec
- **Python is 6-10× slower**: Despite using the same Rust core, PyO3 FFI overhead significantly impacts Python performance
- **Small documents**: Both JS implementations reach 70-90k ops/sec on simple HTML
- **Large documents**: Performance gap widens with complexity
**Note on Python performance**: The current Python bindings have optimization opportunities. The v2 API with direct `convert()` calls performs best; avoid the v1 compatibility layer for performance-critical applications.
## Compatibility (v1 → v2)
- V2’s Rust core sustains **150–210 MB/s** throughput; V1 averaged **≈ 2.5 MB/s** in its Python/BeautifulSoup implementation (60–80× faster).
- The Python package offers a compatibility shim in `html_to_markdown.v1_compat` (`convert_to_markdown`, `convert_to_markdown_stream`, `markdownify`). Details and keyword mappings live in [Python README](https://github.com/Goldziher/html-to-markdown/blob/main/packages/python/README.md#v1-compatibility).
- CLI flag changes, option renames, and other breaking updates are summarised in [CHANGELOG](https://github.com/Goldziher/html-to-markdown/blob/main/CHANGELOG.md#breaking-changes).
## Community
- Chat with us on [Discord](https://discord.gg/pXxagNK2zN)
- Explore the broader [Kreuzberg](https://kreuzberg.dev) document-processing ecosystem
- Sponsor development via [GitHub Sponsors](https://github.com/sponsors/Goldziher)
### Ruby
```ruby
require 'html_to_markdown'
html = '<h1>Hello</h1><p>Rust ❤️ Markdown</p>'
markdown = HtmlToMarkdown.convert(html, heading_style: :atx, wrap: true)
puts markdown
# # Hello
#
# Rust ❤️ Markdown
```
See the language-specific READMEs for complete configuration, hOCR workflows, and inline image extraction.