mdka 2.0.1

HTML to Markdown converter
Documentation

mdka

A Rust library for converting HTML to Markdown.

crates.io npm pypi License

Documentation Dependency Status Executable npm PyPi

mdka balances conversion quality with runtime efficiency — readable output from real-world HTML, without sacrificing speed or memory.

https://nabbisen.github.io/mdka-rs/


Why mdka?

There are several good HTML-to-Markdown converters in the Rust ecosystem. mdka's specific focus is:

Reliable output from diverse HTML sources. It is built on scraper, which uses html5ever — the HTML5 parser from the Servo browser engine. html5ever applies the same parsing algorithm that web browsers use, so it handles malformed tags, deeply nested structures, CMS output, and SPA-rendered DOM without special-casing.

Crash resistance. Conversion uses non-recursive DFS throughout. There is no stack overflow, no matter the nesting depth.

Configurable pre-processing. Five conversion modes let you tune what gets kept or stripped — from noise-free LLM input to lossless archiving.

Multi-language. The same Rust implementation is accessible from Node.js (napi-rs v3) and Python (PyO3 v0).


Quick Start

Try it from the command line

cargo install mdka-cli

echo '<h1>Hello</h1><p><strong>world</strong></p>' | mdka
# # Hello
#
# **world**
mdka page.html                          # → page.md  (same directory)
mdka --mode minimal --drop-shell *.html # strip nav/header/footer
mdka --help                             # full option list

Add to a Rust project

# Cargo.toml
[dependencies]
mdka = "2"
use mdka::html_to_markdown;

let md = html_to_markdown("<h1>Hello</h1><p><em>world</em></p>");
// "# Hello\n\n*world*\n"

With options:

use mdka::{html_to_markdown_with};
use mdka::options::{ConversionMode, ConversionOptions};

let opts = ConversionOptions::for_mode(ConversionMode::Minimal)
    .drop_interactive_shell(true);
let md = html_to_markdown_with(html, &opts);

Add to a Node.js project

npm install mdka
const { htmlToMarkdown, htmlToMarkdownWith } = require('mdka')

const md = htmlToMarkdown('<h1>Hello</h1>')

const md = await htmlToMarkdownWithAsync(html, {
  mode: 'minimal',
  dropInteractiveShell: true,
})

Add to a Python project

pip install mdka
import mdka

md = mdka.html_to_markdown('<h1>Hello</h1>')

md = mdka.html_to_markdown_with(
    html,
    mode=mdka.ConversionMode.MINIMAL,
    drop_interactive_shell=True,
)

Conversion Modes

Mode Use when
Balanced General use — default
Strict Debugging, diff comparison
Minimal LLM input, text extraction
Semantic SPA content, ARIA-aware pipelines
Preserve Archiving, audit trails

Learn More

Full documentation lives in the docs/ folder, published as GitHub Pages: https://nabbisen.github.io/mdka-rs/ .

Topic Link
Installation docs/src/getting-started/installation.md
Rust usage & examples docs/src/getting-started/usage-rust.md
Node.js usage docs/src/getting-started/usage-nodejs.md
Python usage docs/src/getting-started/usage-python.md
CLI reference docs/src/getting-started/usage-cli.md
API reference docs/src/api/index.md
Conversion modes docs/src/api/modes.md
ConversionOptions docs/src/api/options.md
Supported elements docs/src/api/elements.md
Design philosophy docs/src/design/philosophy.md
Performance philosophy docs/src/design/performance.md
Architecture docs/src/design/architecture.md
Benchmarks docs/src/design/performance.md#benchmark-results.md

Note: docs are built as an mdBook project. To build them locally (requires mdBook):

cd docs
mdbook build   # output → docs/book/
mdbook serve   # live-reload preview at http://localhost:3000

Open-source, with care

This project is lovingly built and maintained by volunteers.
We hope it helps streamline your work.
Please understand that the project has its own direction — while we welcome feedback, it might not fit every edge case 🌱

Acknowledgements

Depends on scraper, Servo's html5ever / markup5ever.

Also, napi-rs on binding for Node.js and PyO3's pyo3 / maturin on bindings for Python.