mdka
A HTML to Markdown converter written in Rust.
mdka balances conversion quality with runtime efficiency —
readable output from real-world HTML, without sacrificing speed or memory.
"ka" means "化 (か)" pointing to conversion.
Why mdka?
There are several good HTML-to-Markdown converters in the Rust ecosystem. mdka's specific focus is:
- Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever — the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.
- Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.
- Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped — from noise-free LLM input to lossless archiving.
- Multi-language. The same Rust implementation is accessible from Node.js (napi-rs) and Python (PyO3).
Quick Start
Try it from the command line
cargo (Rust language) installed is required.
|
# # Hello
#
# **world**
Add to a Rust project
# Cargo.toml
[]
= "2"
use html_to_markdown;
let md = html_to_markdown;
// "# Hello\n\n*world*\n"
With options:
use ;
use ;
let mut opts = for_mode;
opts.drop_interactive_shell = true;
let md = html_to_markdown_with;
Add to a Node.js project
const = require
const md =
const md = await
Add to a Python project
=
=
Conversion Modes
| Mode | Use when |
|---|---|
Balanced |
General use — default |
Strict |
Debugging, diff comparison |
Minimal |
LLM input, text extraction |
Semantic |
SPA content, ARIA-aware pipelines |
Preserve |
Archiving, audit trails |
Learn More
Full documentation lives in the docs/ folder, published as GitHub Pages.
https://nabbisen.github.io/mdka-rs/
| Topic | Link |
|---|---|
| Installation | /getting-started/installation |
| Rust Usage & Examples | /getting-started/usage-rust |
| Node.js Usage | /getting-started/usage-nodejs |
| Python Usage | /getting-started/usage-python |
| CLI Reference | /getting-started/usage-cli |
| API Reference | /api/index |
| Conversion Modes | /api/modes |
| ConversionOptions | /api/options |
| Supported Elements | /api/elements |
| Design Philosophy | /design/philosophy |
| Performance Characteristics | /design/performance-characteristics |
| Architecture | /design/architecture |
| Features | /design/features |
Open-source, with care
This project is lovingly built and maintained by volunteers.
We hope it helps streamline your work.
Please understand that the project has its own direction — while we welcome feedback, it might not fit every edge case 🌱
Acknowledgements
Depends on scraper (+ html5ever), ego-tree, rayon, tikv-jemallocator / tikv-jemalloc-ctl, thiserror.
Also, napi-rs on binding for Node.js and PyO3's pyo3 / maturin on bindings for Python.