mdka
A Rust library for converting HTML to Markdown.
mdka balances conversion quality with runtime efficiency — readable output from real-world HTML, without sacrificing speed or memory.
https://nabbisen.github.io/mdka-rs/
Why mdka?
There are several good HTML-to-Markdown converters in the Rust ecosystem. mdka's specific focus is:
Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever — the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.
Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.
Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped — from noise-free LLM input to lossless archiving.
Multi-language. The same Rust implementation is accessible from Node.js (napi-rs v3) and Python (PyO3 v0).
Quick Start
Try it from the command line
|
# # Hello
#
# **world**
Add to a Rust project
# Cargo.toml
[]
= "2"
use html_to_markdown;
let md = html_to_markdown;
// "# Hello\n\n*world*\n"
With options:
use ;
use ;
let opts = for_mode
.drop_interactive_shell;
let md = html_to_markdown_with;
Add to a Node.js project
const = require
const md =
const md = await
Add to a Python project
=
=
Conversion Modes
| Mode | Use when |
|---|---|
Balanced |
General use — default |
Strict |
Debugging, diff comparison |
Minimal |
LLM input, text extraction |
Semantic |
SPA content, ARIA-aware pipelines |
Preserve |
Archiving, audit trails |
Learn More
Full documentation lives in the docs/ folder, published as GitHub Pages: https://nabbisen.github.io/mdka-rs/ .
| Topic | Link |
|---|---|
| Installation | docs/src/getting-started/installation.md |
| Rust usage & examples | docs/src/getting-started/usage-rust.md |
| Node.js usage | docs/src/getting-started/usage-nodejs.md |
| Python usage | docs/src/getting-started/usage-python.md |
| CLI reference | docs/src/getting-started/usage-cli.md |
| API reference | docs/src/api/index.md |
| Conversion modes | docs/src/api/modes.md |
| ConversionOptions | docs/src/api/options.md |
| Supported elements | docs/src/api/elements.md |
| Design philosophy | docs/src/design/philosophy.md |
| Performance concern | docs/src/design/performance.md |
| Architecture | docs/src/design/architecture.md |
Note: docs are built as an mdBook project. To build them locally (requires mdBook):
Open-source, with care
This project is lovingly built and maintained by volunteers.
We hope it helps streamline your work.
Please understand that the project has its own direction — while we welcome feedback, it might not fit every edge case 🌱
Acknowledgements
Depends on scraper, Servo's html5ever / markup5ever.
Also, napi-rs on binding for Node.js and PyO3's pyo3 / maturin on bindings for Python.