html-to-markdown
High-performance HTML → Markdown conversion powered by Rust. Shipping as a Rust crate, Python package, PHP extension, Ruby gem, Elixir Rustler NIF, Node.js bindings, WebAssembly, and standalone CLI with identical rendering behavior across all runtimes.
Key Features
- Blazing Fast – Rust-powered core delivers 10-80× faster conversion than pure Python alternatives (150–280 MB/s)
- Polyglot – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, PHP, Go, Java, C#, and Elixir
- Smart Conversion – Handles complex documents including nested tables, code blocks, task lists, and hOCR OCR output
- Metadata Extraction – Extract document metadata (title, description, headers, links, images, structured data) alongside conversion
- Visitor Pattern – Custom callbacks for domain-specific dialects, content filtering, URL rewriting, accessibility validation
- Highly Configurable – Control heading styles, code block fences, list formatting, whitespace handling, and HTML sanitization
- Tag Preservation – Keep specific HTML tags unconverted when markdown isn't expressive enough
- Secure by Default – Built-in HTML sanitization prevents malicious content
- Consistent Output – Identical markdown rendering across all language bindings
Installation
Each language binding provides comprehensive documentation with installation instructions, examples, and best practices. Choose your platform to get started:
Scripting Languages:
- Python – PyPI package, metadata extraction, visitor pattern, CLI included
- Ruby – RubyGems package, RBS type definitions, Steep checking
- PHP – Composer package + PIE extension, PHP 8.2+, PHPStan level 9
- Elixir – Hex package, Rustler NIF bindings, Elixir 1.19+
JavaScript/TypeScript:
- Node.js / TypeScript – Native NAPI-RS bindings for Node.js/Bun, fastest performance, WebAssembly for browsers/Deno
Compiled Languages:
- Go – Go module with FFI bindings, automatic library download
- Java – Maven Central, Panama Foreign Function & Memory API, Java 24+
- C# – NuGet package, .NET 8.0+, P/Invoke FFI bindings
Native:
- Rust – Core library, flexible feature flags, zero-copy APIs
Command-Line:
Extract comprehensive metadata during conversion: title, description, headers, links, images, structured data (JSON-LD, Microdata, RDFa). Use cases: SEO extraction, table-of-contents generation, link validation, accessibility auditing, content migration.
Customize HTML→Markdown conversion with callbacks for specific elements. Intercept links, images, headings, lists, and more. Use cases: domain-specific Markdown dialects (Obsidian, Notion), content filtering, URL rewriting, accessibility validation, analytics.
Rust-powered core delivers 150–280 MB/s throughput (10-80× faster than pure Python alternatives). Includes benchmarking tools, memory profiling, streaming strategies, and optimization tips.
Keep specific HTML tags unconverted when Markdown isn't expressive enough. Useful for tables, SVG, custom elements, or when you need mixed HTML/Markdown output.
See language-specific documentation for preserveTags configuration.
Built-in HTML sanitization prevents XSS attacks and malicious content. Powered by ammonia with safe defaults. Configurable via sanitize options.
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines on:
- Setting up the development environment
- Running tests locally (Rust 95%+ coverage, language bindings 80%+)
- Submitting pull requests
- Reporting issues
All contributions must follow code quality standards enforced via pre-commit hooks (prek).
License
MIT License – see LICENSE for details. You can use html-to-markdown freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.