html-to-markdown-cli 3.7.0

Command-line interface for html-to-markdown - high-performance HTML to Markdown converter
html-to-markdown-cli-3.7.0 is not a library.

html-to-markdown

Fast, robust HTML → Markdown for 16 languages. A tiered converter that picks the safest, fastest path per input without losing content.

What and Why?

html-to-markdown converts real-world HTML — unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — into clean CommonMark (or Djot) without losing content, from one Rust core with native bindings for 16 languages.

It routes each input through three tiers: a single-pass byte scanner for clean HTML, a tolerant DOM walker for complex inputs, and an html5ever repair pass for malformed HTML — with byte-identical output across tiers, enforced by a 116-snapshot oracle and per-group performance gates in CI. The dispatcher is invisible: the same convert() call works regardless of which tier runs.

Features

Feature Description
16 languages, one Rust core Rust, Python, Node.js, WASM, Java, Go, C#, PHP, Ruby, Elixir, R, Dart, Kotlin (Android), Swift, Zig, and a C ABI
Tiered dispatch Byte scanner → DOM walker → html5ever repair, with byte-equal output across tiers
Real-HTML robust Unclosed tags, CDATA, custom elements, malformed entities, nested tables, mixed encodings — handled without losing content
GFM tables Padded cells, alignment, and pipe escaping
Djot output Set output_format = "djot" to emit Djot instead of Markdown
Metadata extraction Parse <head> into structured metadata (Open Graph, Twitter, JSON-LD, microdata, RDFa, header hierarchy)
Inline images Opt-in mirroring of data URIs and remote image references
Visitor API Feature-gated traversal to transform the converted Markdown AST
Configurable preprocessing Standard, strict, and lenient presets — or build your own
Fast 19–116 MB/s on the Wikipedia/mdream corpus; per-group regression thresholds enforced on every PR

Quick Start

convert() is the single entry point — it returns a structured result with content, warnings, and optional metadata.

Language Packages

cargo add html-to-markdown-rs

See Rust README for full documentation.

pip install html-to-markdown

See Python README for full documentation.

npm install @kreuzberg/html-to-markdown

See Node.js README for full documentation.

go get github.com/kreuzberg-dev/html-to-markdown/packages/go/v3

See Go README for full documentation.

Available on Maven Central as dev.kreuzberg:html-to-markdown. See Java README for the dependency snippet and current version.

dotnet add package KreuzbergDev.HtmlToMarkdown

See C# README for full documentation.

gem install html-to-markdown

See Ruby README for full documentation.

composer require kreuzberg-dev/html-to-markdown

See PHP README for full documentation.

Add {:html_to_markdown, "~> 3.6"} to your mix.exs dependencies. See Elixir README for full documentation.

install.packages("htmltomarkdown", repos = "https://kreuzberg-dev.r-universe.dev")

See R README for full documentation.

dart pub add h2m

See Dart README for full documentation.

Available on Maven Central as dev.kreuzberg:html-to-markdown-android. See Kotlin README for the dependency snippet and current version.

Add via Swift Package Manager. See Swift README for full documentation.

See Zig README for installation and usage.

npm install @kreuzberg/html-to-markdown-wasm

See WebAssembly README for full documentation.

Pre-built .so / .dll / .dylib from GitHub Releases. See FFI crate for full documentation.

cargo install html-to-markdown-cli
brew install kreuzberg-dev/tap/html-to-markdown

See CLI usage for full documentation.

AI Coding Assistants

Install the html-to-markdown plugin from the kreuzberg-dev/plugins marketplace. It ships the html-to-markdown agent skills and works with every major coding agent — expand your harness below.

/plugin marketplace add kreuzberg-dev/plugins
/plugin install html-to-markdown@kreuzberg
/plugins add https://github.com/kreuzberg-dev/plugins

Then search for html-to-markdown and select Install Plugin.

Settings → Plugins → Add from URL → https://github.com/kreuzberg-dev/plugins, then select html-to-markdown.

gemini extensions install https://github.com/kreuzberg-dev/plugins
droid plugin marketplace add https://github.com/kreuzberg-dev/plugins
droid plugin install html-to-markdown@kreuzberg
copilot plugin marketplace add https://github.com/kreuzberg-dev/plugins
copilot plugin install html-to-markdown@kreuzberg

Add the package to opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": ["@kreuzberg/opencode-html-to-markdown"]
}

Documentation

Full guides, the convert() API for every binding, tier architecture, the metadata and visitor APIs, and performance benchmarks live at docs.html-to-markdown.kreuzberg.dev.

Part of Kreuzberg.dev

  • Kreuzberg — document intelligence: text, tables, metadata from 91+ formats with optional OCR.
  • Kreuzberg Cloud — managed extraction API with SDKs, dashboards, and observability.
  • kreuzcrawl — web crawling and scraping with HTML→Markdown and headless-Chrome fallback.
  • html-to-markdown — fast, lossless HTML→Markdown engine.
  • liter-llm — universal LLM API client with native bindings for 14 languages and 143 providers.
  • tree-sitter-language-pack — tree-sitter grammars and code-intelligence primitives.
  • alef — the polyglot binding generator that produces every per-language binding across the 5 polyglot repos.

Contributing

Contributions welcome! See CONTRIBUTING.md for setup instructions and guidelines.

License

MIT License — see LICENSE for details.