# html-to-markdown
<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
<a href="https://github.com/kreuzberg-dev/alef">
<img src="https://img.shields.io/badge/built%20with-alef%20%D7%90-007ec6" alt="Built with alef">
</a>
<a href="https://crates.io/crates/html-to-markdown-rs">
<img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
</a>
<a href="https://pypi.org/project/html-to-markdown/">
<img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
<img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
</a>
<a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
<img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
</a>
<a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3">
<img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3*" alt="Go">
</a>
<a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
<img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
</a>
<a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
<img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
</a>
<a href="https://rubygems.org/gems/html-to-markdown">
<img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
</a>
<a href="https://hex.pm/packages/html_to_markdown">
<img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
</a>
<a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
<img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
</a>
<a href="https://pub.dev/packages/h2m">
<img src="https://img.shields.io/pub/v/h2m?label=Dart&color=007ec6" alt="Dart">
</a>
<a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown-android">
<img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown-android?label=Kotlin&color=007ec6" alt="Kotlin">
</a>
<a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift">
<img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
</a>
<a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig">
<img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
</a>
<a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
<img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
</a>
<a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
<img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
</a>
<a href="https://docs.html-to-markdown.kreuzberg.dev">
<img src="https://img.shields.io/badge/Docs-html--to--markdown-007ec6" alt="Documentation">
</a>
</div>
<div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
<a href="https://discord.gg/xt9WY3GnKR">
<img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
</a>
<a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
<img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
</a>
</div>
Fast, robust HTML → Markdown for 16 languages. A tiered converter that picks the safest, fastest path per input without losing content.
## What and Why?
html-to-markdown converts real-world HTML — unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — into clean CommonMark (or Djot) without losing content, from one Rust core with native bindings for 16 languages.
It routes each input through three tiers: a single-pass byte scanner for clean HTML, a tolerant DOM walker for complex inputs, and an `html5ever` repair pass for malformed HTML — with byte-identical output across tiers, enforced by a 116-snapshot oracle and per-group performance gates in CI. The dispatcher is invisible: the same `convert()` call works regardless of which tier runs.
### Features
| **16 languages, one Rust core** | Rust, Python, Node.js, WASM, Java, Go, C#, PHP, Ruby, Elixir, R, Dart, Kotlin (Android), Swift, Zig, and a C ABI |
| **Tiered dispatch** | Byte scanner → DOM walker → `html5ever` repair, with byte-equal output across tiers |
| **Real-HTML robust** | Unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — handled without losing content |
| **GFM tables** | Padded cells, alignment, and pipe escaping |
| **Djot output** | Set `output_format = "djot"` to emit Djot instead of Markdown |
| **Metadata extraction** | Parse `<head>` into structured metadata (Open Graph, Twitter, JSON-LD, microdata, RDFa, header hierarchy) |
| **Inline images** | Opt-in mirroring of data URIs and remote image references |
| **Visitor API** | Feature-gated traversal to transform the converted Markdown AST |
| **Configurable preprocessing** | Standard, strict, and lenient presets — or build your own |
| **Fast** | 19–116 MB/s on the Wikipedia/mdream corpus; per-group regression thresholds enforced on every PR |
<div align="center">
<a href="https://github.com/kreuzberg-dev/html-to-markdown/stargazers">
<img src="docs/assets/star.gif" alt="Star html-to-markdown on GitHub" width="640">
</a>
</div>
<p align="center"><strong>⭐ Star this repo to show your support — it helps others discover html-to-markdown.</strong></p>
## Quick Start
`convert()` is the single entry point — it returns a structured result with `content`, `warnings`, and optional `metadata`.
### Language Packages
<details open>
<summary><strong>Rust</strong></summary>
```sh
cargo add html-to-markdown-rs
```
See [Rust README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown) for full documentation.
</details>
<details>
<summary><strong>Python</strong></summary>
```sh
pip install html-to-markdown
```
See [Python README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/python) for full documentation.
</details>
<details>
<summary><strong>Node.js</strong></summary>
```sh
npm install @kreuzberg/html-to-markdown
```
See [Node.js README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-node) for full documentation.
</details>
<details>
<summary><strong>Go</strong></summary>
```sh
go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3
```
See [Go README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/go) for full documentation.
</details>
<details>
<summary><strong>Java</strong></summary>
Available on Maven Central as `dev.kreuzberg:html-to-markdown`. See [Java README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/java) for the dependency snippet and current version.
</details>
<details>
<summary><strong>C#</strong></summary>
```sh
dotnet add package KreuzbergDev.HtmlToMarkdown
```
See [C# README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/csharp) for full documentation.
</details>
<details>
<summary><strong>Ruby</strong></summary>
```sh
gem install html-to-markdown
```
See [Ruby README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/ruby) for full documentation.
</details>
<details>
<summary><strong>PHP</strong></summary>
```sh
composer require kreuzberg-dev/html-to-markdown
```
See [PHP README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/php) for full documentation.
</details>
<details>
<summary><strong>Elixir</strong></summary>
Add `{:html_to_markdown, "~> 3.6"}` to your `mix.exs` dependencies. See [Elixir README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/elixir) for full documentation.
</details>
<details>
<summary><strong>R</strong></summary>
```r
install.packages("htmltomarkdown", repos = "https://kreuzberg-dev.r-universe.dev")
```
See [R README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/r) for full documentation.
</details>
<details>
<summary><strong>Dart / Flutter</strong></summary>
```sh
dart pub add h2m
```
See [Dart README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/dart) for full documentation.
</details>
<details>
<summary><strong>Kotlin (Android)</strong></summary>
Available on Maven Central as `dev.kreuzberg:html-to-markdown-android`. See [Kotlin README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/kotlin-android) for the dependency snippet and current version.
</details>
<details>
<summary><strong>Swift</strong></summary>
Add via Swift Package Manager. See [Swift README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift) for full documentation.
</details>
<details>
<summary><strong>Zig</strong></summary>
See [Zig README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig) for installation and usage.
</details>
<details>
<summary><strong>WebAssembly</strong></summary>
```sh
npm install @kreuzberg/html-to-markdown-wasm
```
See [WebAssembly README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-wasm) for full documentation.
</details>
<details>
<summary><strong>C/C++ (FFI)</strong></summary>
Pre-built `.so` / `.dll` / `.dylib` from [GitHub Releases](https://github.com/kreuzberg-dev/html-to-markdown/releases). See [FFI crate](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-ffi) for full documentation.
</details>
<details>
<summary><strong>CLI</strong></summary>
```sh
cargo install html-to-markdown-cli
```
```sh
brew install kreuzberg-dev/tap/html-to-markdown
```
See [CLI usage](https://docs.html-to-markdown.kreuzberg.dev) for full documentation.
</details>
### AI Coding Assistants
Install the html-to-markdown plugin from the [`kreuzberg-dev/plugins`](https://github.com/kreuzberg-dev/plugins) marketplace. It ships the html-to-markdown agent skills and works with every major coding agent — expand your harness below.
<details open>
<summary><strong>Claude Code</strong></summary>
```text
/plugin marketplace add kreuzberg-dev/plugins
/plugin install html-to-markdown@kreuzberg
```
</details>
<details>
<summary><strong>Codex CLI</strong></summary>
```text
/plugins add https://github.com/kreuzberg-dev/plugins
```
Then search for `html-to-markdown` and select **Install Plugin**.
</details>
<details>
<summary><strong>Cursor</strong></summary>
Settings → Plugins → Add from URL → `https://github.com/kreuzberg-dev/plugins`, then select **html-to-markdown**.
</details>
<details>
<summary><strong>Gemini CLI</strong></summary>
```text
gemini extensions install https://github.com/kreuzberg-dev/plugins
```
</details>
<details>
<summary><strong>Factory Droid</strong></summary>
```text
droid plugin marketplace add https://github.com/kreuzberg-dev/plugins
droid plugin install html-to-markdown@kreuzberg
```
</details>
<details>
<summary><strong>GitHub Copilot CLI</strong></summary>
```text
copilot plugin marketplace add https://github.com/kreuzberg-dev/plugins
copilot plugin install html-to-markdown@kreuzberg
```
</details>
<details>
<summary><strong>opencode</strong></summary>
Add the package to `opencode.json`:
```json
{
"$schema": "https://opencode.ai/config.json",
"plugin": ["@kreuzberg/opencode-html-to-markdown"]
}
```
</details>
## Documentation
Full guides, the `convert()` API for every binding, tier architecture, the metadata and visitor APIs, and performance benchmarks live at **[docs.html-to-markdown.kreuzberg.dev](https://docs.html-to-markdown.kreuzberg.dev)**.
## Part of Kreuzberg.dev
- [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl](https://github.com/kreuzberg-dev/kreuzcrawl) — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.
## Contributing
Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.
## License
MIT License — see [LICENSE](LICENSE) for details.