html-to-markdown-cli 2.18.0

Command-line interface for html-to-markdown - high-performance HTML to Markdown converter
html-to-markdown-cli-2.18.0 is not a library.

html-to-markdown

High-performance HTML โ†’ Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behaviour.

Part of the Kreuzberg.dev document intelligence ecosystem. Kreuzberg is a polyglot document intelligence framework with a fast Rust core. We build tools that help developers extract, process, and understand documents at scale, from PDFs to Office files, images, archives, emails, in 50+ formats. We've set out to make high-performance document intelligence faster and more ecological.

Crates.io npm (node) npm (wasm) PyPI Packagist RubyGems Hex.pm NuGet Maven Central Go Reference License: MIT Discord


๐ŸŽฎ Try the Live Demo โ†’

Experience WebAssembly-powered HTML to Markdown conversion instantly in your browser. No installation needed!


Why html-to-markdown?

  • Blazing Fast: Rust-powered core delivers 10-80ร— faster conversion than pure Python alternatives
  • Universal: Works everywhere - Node.js, Bun, Deno, browsers, Python, Rust, and standalone CLI
  • Smart Conversion: Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
  • Metadata Extraction: Extract document metadata (title, description, headers, links, images) alongside conversion
  • Highly Configurable: Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
  • Tag Preservation: Keep specific HTML tags unconverted when markdown isn't expressive enough
  • Secure by Default: Built-in HTML sanitization prevents malicious content
  • Consistent Output: Identical markdown rendering across all language bindings

Quick Start

Node.js / Bun (Native - Fastest):

import { convert } from 'html-to-markdown-node';

const html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>';
const markdown = convert(html, {
  headingStyle: 'Atx',
  codeBlockStyle: 'Backticks',
  wrap: true,
  preserveTags: ['table'],
});

Python:

from html_to_markdown import convert

html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>'
markdown = convert(html, heading_style='Atx', wrap=True)

Ruby:

require 'html_to_markdown'

html = '<h1>Hello</h1><p>Rust โค๏ธ Markdown</p>'
markdown = HtmlToMarkdown.convert(html, heading_style: :atx, wrap: true)

Full language guides: See Language Guides below.

Installation

Target Command(s)
Node.js/Bun (native) npm install html-to-markdown-node
WebAssembly (universal) npm install html-to-markdown-wasm
Deno import { convert } from "npm:html-to-markdown-wasm"
Python (bindings + CLI) pip install html-to-markdown
PHP (extension + helpers) PHP_EXTENSION_DIR=$(php-config --extension-dir) pie install goldziher/html-to-markdowncomposer require goldziher/html-to-markdown
Ruby gem bundle add html-to-markdown or gem install html-to-markdown
Elixir (Rustler NIF) {:html_to_markdown, "~> 2.8"}
Rust crate cargo add html-to-markdown-rs
Rust CLI (crates.io) cargo install html-to-markdown-cli
Homebrew CLI brew install html-to-markdown (core)
Releases GitHub Releases

Performance

Benchmarked on Apple M4 using the shared fixture harness in tools/benchmark-harness.

Comparative Throughput (Median Across Fixtures)

Runtime Median ops/sec Median throughput (MB/s) Peak memory (MB) Successes
Rust 1,060.3 116.4 171.3 56/56
Go 1,496.3 131.1 22.9 16/16
Ruby 2,155.5 300.4 280.3 48/48
PHP 2,357.7 308.0 223.5 48/48
Elixir 1,564.1 269.1 384.7 48/48
C# 1,234.2 272.4 187.8 16/16
Java 1,298.7 167.1 527.2 16/16
WASM 1,485.8 157.6 95.3 48/48
Node.js (NAPI) 2,054.2 306.5 95.4 48/48
Python (PyO3) 3,120.3 307.5 83.5 48/48

Use task bench:harness to regenerate throughput numbers. See Performance Guide for benchmarking strategies and optimization tips.

Language Guides

Complete documentation with examples for each language:

  • Python โ€“ README | PyO3 bindings, metadata extraction, inline images
  • JavaScript/TypeScript โ€“ Node.js | TypeScript | WASM
  • Ruby โ€“ README | Magnus bindings, RBS type definitions, Steep checking
  • PHP โ€“ Package | Extension (PIE) | ext-php-rs extension
  • Go โ€“ README | FFI bindings with cgo
  • Java โ€“ README | Panama FFI, Maven/Gradle setup
  • C#/.NET โ€“ README | P/Invoke FFI, NuGet distribution
  • Elixir โ€“ README | Rustler NIF bindings
  • Rust โ€“ README | Core library, error handling, advanced features

Feature Guides

Visitor Pattern

Customize HTMLโ†’Markdown conversion with callbacks for specific elements. Use cases: domain-specific dialects, content filtering, URL rewriting, accessibility validation.

โ†’ Full Guide with Examples (Python, TypeScript, Ruby)

Metadata Extraction

Extract comprehensive metadata during conversion: title, description, headers, links, images, structured data. Use cases: SEO extraction, TOC generation, link validation, accessibility auditing, content migration.

โ†’ Full Guide with Examples (Python, TypeScript, Ruby)

Performance & Benchmarking

Understand performance characteristics, run benchmarks, optimize for your use case. Includes benchmarking tools, memory profiling, streaming strategies, and optimization tips.

โ†’ Full Guide

Examples

Explore working code examples in multiple languages:

Example Path Languages
Visitor Pattern examples/visitor-pattern/ Python, TypeScript, Ruby
Metadata Extraction examples/metadata-extraction/ Python, TypeScript, Ruby
Performance examples/performance/ Benchmarks, profiling, optimization

Testing

Run the test suite locally:

# All core test suites (Rust, Python, Ruby, Node, PHP, Go, C#, Elixir, Java)
task test

# Run the Wasmtime-backed WASM integration tests
task wasm:test:wasmtime

Compatibility (v1 โ†’ v2)

  • V2's Rust core sustains 150โ€“210 MB/s throughput; V1 averaged โ‰ˆ 2.5 MB/s (60โ€“80ร— faster).
  • Python compatibility shim available in html_to_markdown.v1_compat (deprecated; emits warnings; plan migrations now). See Python README for keyword mappings.
  • CLI flag changes and other breaking updates in CHANGELOG.

Community

License

MIT License โ€“ see LICENSE for details.