html-to-markdown-cli 3.6.21

Command-line interface for html-to-markdown - high-performance HTML to Markdown converter
# html-to-markdown

<div align="center" style="display: flex; flex-wrap: wrap; gap: 8px; justify-content: center; margin: 20px 0;">
  <a href="https://github.com/kreuzberg-dev/alef">
    <img src="https://img.shields.io/badge/built%20with-alef%20%D7%90-007ec6" alt="Built with alef">
  </a>
  <!-- Language Bindings -->
  <a href="https://crates.io/crates/html-to-markdown-rs">
    <img src="https://img.shields.io/crates/v/html-to-markdown-rs?label=Rust&color=007ec6" alt="Rust">
  </a>
  <a href="https://pypi.org/project/html-to-markdown/">
    <img src="https://img.shields.io/pypi/v/html-to-markdown?label=Python&color=007ec6" alt="Python">
  </a>
  <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-node">
    <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-node?label=Node.js&color=007ec6" alt="Node.js">
  </a>
  <a href="https://www.npmjs.com/package/@kreuzberg/html-to-markdown-wasm">
    <img src="https://img.shields.io/npm/v/@kreuzberg/html-to-markdown-wasm?label=WASM&color=007ec6" alt="WASM">
  </a>
  <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown">
    <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown?label=Java&color=007ec6" alt="Java">
  </a>
  <a href="https://pkg.go.dev/github.com/kreuzberg-dev/html-to-markdown/packages/go/v3">
    <img src="https://img.shields.io/github/v/tag/kreuzberg-dev/html-to-markdown?label=Go&color=007ec6&filter=v3*" alt="Go">
  </a>
  <a href="https://www.nuget.org/packages/KreuzbergDev.HtmlToMarkdown/">
    <img src="https://img.shields.io/nuget/v/KreuzbergDev.HtmlToMarkdown?label=C%23&color=007ec6" alt="C#">
  </a>
  <a href="https://packagist.org/packages/kreuzberg-dev/html-to-markdown">
    <img src="https://img.shields.io/packagist/v/kreuzberg-dev/html-to-markdown?label=PHP&color=007ec6" alt="PHP">
  </a>
  <a href="https://rubygems.org/gems/html-to-markdown">
    <img src="https://img.shields.io/gem/v/html-to-markdown?label=Ruby&color=007ec6" alt="Ruby">
  </a>
  <a href="https://hex.pm/packages/html_to_markdown">
    <img src="https://img.shields.io/hexpm/v/html_to_markdown?label=Elixir&color=007ec6" alt="Elixir">
  </a>
  <a href="https://kreuzberg-dev.r-universe.dev/htmltomarkdown">
    <img src="https://img.shields.io/badge/R-htmltomarkdown-007ec6" alt="R">
  </a>
  <a href="https://pub.dev/packages/h2m">
    <img src="https://img.shields.io/pub/v/h2m?label=Dart&color=007ec6" alt="Dart">
  </a>
  <a href="https://central.sonatype.com/artifact/dev.kreuzberg/html-to-markdown-android">
    <img src="https://img.shields.io/maven-central/v/dev.kreuzberg/html-to-markdown-android?label=Kotlin&color=007ec6" alt="Kotlin">
  </a>
  <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift">
    <img src="https://img.shields.io/badge/Swift-SPM-007ec6" alt="Swift">
  </a>
  <a href="https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig">
    <img src="https://img.shields.io/badge/Zig-package-007ec6" alt="Zig">
  </a>
  <a href="https://github.com/kreuzberg-dev/html-to-markdown/releases">
    <img src="https://img.shields.io/badge/C-FFI-007ec6" alt="C FFI">
  </a>

  <!-- Project Info -->
  <a href="https://github.com/kreuzberg-dev/html-to-markdown/blob/main/LICENSE">
    <img src="https://img.shields.io/badge/License-MIT-007ec6" alt="License">
  </a>
  <a href="https://docs.html-to-markdown.kreuzberg.dev">
    <img src="https://img.shields.io/badge/Docs-html--to--markdown-007ec6" alt="Documentation">
  </a>
</div>

<div align="center" style="display: flex; flex-wrap: wrap; gap: 12px; justify-content: center; margin: 28px 0 24px;">
  <a href="https://discord.gg/xt9WY3GnKR">
    <img height="22" src="https://img.shields.io/badge/Discord-Chat-007ec6?logo=discord&logoColor=white" alt="Join Discord">
  </a>
  <a href="https://docs.html-to-markdown.kreuzberg.dev/demo/">
    <img height="22" src="https://img.shields.io/badge/Live%20Demo-Open-007ec6?logo=webassembly&logoColor=white" alt="Live Demo">
  </a>
</div>

Fast, robust HTML → Markdown for 16 languages. A tiered converter that picks the safest, fastest path per input without losing content.

## What and Why?

html-to-markdown converts real-world HTML — unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — into clean CommonMark (or Djot) without losing content, from one Rust core with native bindings for 16 languages.

It routes each input through three tiers: a single-pass byte scanner for clean HTML, a tolerant DOM walker for complex inputs, and an `html5ever` repair pass for malformed HTML — with byte-identical output across tiers, enforced by a 116-snapshot oracle and per-group performance gates in CI. The dispatcher is invisible: the same `convert()` call works regardless of which tier runs.

### Features

| Feature | Description |
| ------- | ----------- |
| **16 languages, one Rust core** | Rust, Python, Node.js, WASM, Java, Go, C#, PHP, Ruby, Elixir, R, Dart, Kotlin (Android), Swift, Zig, and a C ABI |
| **Tiered dispatch** | Byte scanner → DOM walker → `html5ever` repair, with byte-equal output across tiers |
| **Real-HTML robust** | Unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — handled without losing content |
| **GFM tables** | Padded cells, alignment, and pipe escaping |
| **Djot output** | Set `output_format = "djot"` to emit Djot instead of Markdown |
| **Metadata extraction** | Parse `<head>` into structured metadata (Open Graph, Twitter, JSON-LD, microdata, RDFa, header hierarchy) |
| **Inline images** | Opt-in mirroring of data URIs and remote image references |
| **Visitor API** | Feature-gated traversal to transform the converted Markdown AST |
| **Configurable preprocessing** | Standard, strict, and lenient presets — or build your own |
| **Fast** | 19–116 MB/s on the Wikipedia/mdream corpus; per-group regression thresholds enforced on every PR |

<div align="center">
  <a href="https://github.com/kreuzberg-dev/html-to-markdown/stargazers">
    <img src="docs/assets/star.gif" alt="Star html-to-markdown on GitHub" width="640">
  </a>
</div>

<p align="center"><strong>⭐ Star this repo to show your support — it helps others discover html-to-markdown.</strong></p>

## Quick Start

`convert()` is the single entry point — it returns a structured result with `content`, `warnings`, and optional `metadata`.

### Language Packages

<details open>
<summary><strong>Rust</strong></summary>

```sh
cargo add html-to-markdown-rs
```

See [Rust README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown) for full documentation.

</details>

<details>
<summary><strong>Python</strong></summary>

```sh
pip install html-to-markdown
```

See [Python README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/python) for full documentation.

</details>

<details>
<summary><strong>Node.js</strong></summary>

```sh
npm install @kreuzberg/html-to-markdown
```

See [Node.js README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-node) for full documentation.

</details>

<details>
<summary><strong>Go</strong></summary>

```sh
go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3
```

See [Go README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/go) for full documentation.

</details>

<details>
<summary><strong>Java</strong></summary>

Available on Maven Central as `dev.kreuzberg:html-to-markdown`. See [Java README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/java) for the dependency snippet and current version.

</details>

<details>
<summary><strong>C#</strong></summary>

```sh
dotnet add package KreuzbergDev.HtmlToMarkdown
```

See [C# README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/csharp) for full documentation.

</details>

<details>
<summary><strong>Ruby</strong></summary>

```sh
gem install html-to-markdown
```

See [Ruby README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/ruby) for full documentation.

</details>

<details>
<summary><strong>PHP</strong></summary>

```sh
composer require kreuzberg-dev/html-to-markdown
```

See [PHP README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/php) for full documentation.

</details>

<details>
<summary><strong>Elixir</strong></summary>

Add `{:html_to_markdown, "~> 3.6"}` to your `mix.exs` dependencies. See [Elixir README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/elixir) for full documentation.

</details>

<details>
<summary><strong>R</strong></summary>

```r
install.packages("htmltomarkdown", repos = "https://kreuzberg-dev.r-universe.dev")
```

See [R README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/r) for full documentation.

</details>

<details>
<summary><strong>Dart / Flutter</strong></summary>

```sh
dart pub add h2m
```

See [Dart README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/dart) for full documentation.

</details>

<details>
<summary><strong>Kotlin (Android)</strong></summary>

Available on Maven Central as `dev.kreuzberg:html-to-markdown-android`. See [Kotlin README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/kotlin-android) for the dependency snippet and current version.

</details>

<details>
<summary><strong>Swift</strong></summary>

Add via Swift Package Manager. See [Swift README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/swift) for full documentation.

</details>

<details>
<summary><strong>Zig</strong></summary>

See [Zig README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/packages/zig) for installation and usage.

</details>

<details>
<summary><strong>WebAssembly</strong></summary>

```sh
npm install @kreuzberg/html-to-markdown-wasm
```

See [WebAssembly README](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-wasm) for full documentation.

</details>

<details>
<summary><strong>C/C++ (FFI)</strong></summary>

Pre-built `.so` / `.dll` / `.dylib` from [GitHub Releases](https://github.com/kreuzberg-dev/html-to-markdown/releases). See [FFI crate](https://github.com/kreuzberg-dev/html-to-markdown/tree/main/crates/html-to-markdown-ffi) for full documentation.

</details>

<details>
<summary><strong>CLI</strong></summary>

```sh
cargo install html-to-markdown-cli
```

```sh
brew install kreuzberg-dev/tap/html-to-markdown
```

See [CLI usage](https://docs.html-to-markdown.kreuzberg.dev) for full documentation.

</details>

### AI Coding Assistants

Install the html-to-markdown plugin from the [`kreuzberg-dev/plugins`](https://github.com/kreuzberg-dev/plugins) marketplace. It ships the html-to-markdown agent skills and works with every major coding agent — expand your harness below.

<details open>
<summary><strong>Claude Code</strong></summary>

```text
/plugin marketplace add kreuzberg-dev/plugins
/plugin install html-to-markdown@kreuzberg
```

</details>

<details>
<summary><strong>Codex CLI</strong></summary>

```text
/plugins add https://github.com/kreuzberg-dev/plugins
```

Then search for `html-to-markdown` and select **Install Plugin**.

</details>

<details>
<summary><strong>Cursor</strong></summary>

Settings → Plugins → Add from URL → `https://github.com/kreuzberg-dev/plugins`, then select **html-to-markdown**.

</details>

<details>
<summary><strong>Gemini CLI</strong></summary>

```text
gemini extensions install https://github.com/kreuzberg-dev/plugins
```

</details>

<details>
<summary><strong>Factory Droid</strong></summary>

```text
droid plugin marketplace add https://github.com/kreuzberg-dev/plugins
droid plugin install html-to-markdown@kreuzberg
```

</details>

<details>
<summary><strong>GitHub Copilot CLI</strong></summary>

```text
copilot plugin marketplace add https://github.com/kreuzberg-dev/plugins
copilot plugin install html-to-markdown@kreuzberg
```

</details>

<details>
<summary><strong>opencode</strong></summary>

Add the package to `opencode.json`:

```json
{
  "$schema": "https://opencode.ai/config.json",
  "plugin": ["@kreuzberg/opencode-html-to-markdown"]
}
```

</details>

## Documentation

Full guides, the `convert()` API for every binding, tier architecture, the metadata and visitor APIs, and performance benchmarks live at **[docs.html-to-markdown.kreuzberg.dev](https://docs.html-to-markdown.kreuzberg.dev)**.

## Part of Kreuzberg.dev

- [Kreuzberg]https://github.com/kreuzberg-dev/kreuzberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
- [Kreuzberg Cloud]https://github.com/kreuzberg-dev/kreuzberg-cloud — managed extraction API with SDKs, dashboards, and observability.
- [kreuzcrawl]https://github.com/kreuzberg-dev/kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
- [html-to-markdown]https://github.com/kreuzberg-dev/html-to-markdown — fast, lossless HTML→Markdown engine.
- [liter-llm]https://github.com/kreuzberg-dev/liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack]https://github.com/kreuzberg-dev/tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
- [alef]https://github.com/kreuzberg-dev/alef — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.

## Contributing

Contributions welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions and guidelines.

## License

MIT License — see [LICENSE](LICENSE) for details.