html-to-markdown-rs
High-performance HTML to Markdown converter built with Rust.
Fast, reliable HTML to Markdown conversion with full CommonMark compliance. Built with html5ever for correctness and ammonia for safe HTML preprocessing.
Installation
[]
= "2.3"
Basic Usage
use ;
Configuration
use ;
let options = ConversionOptions ;
let markdown = convert?;
Web Scraping with Preprocessing
use ;
let mut options = default;
options.preprocessing.enabled = true;
options.preprocessing.preset = Aggressive;
options.preprocessing.remove_navigation = true;
options.preprocessing.remove_forms = true;
let markdown = convert?;
hOCR Table Extraction
use convert;
// hOCR documents (from Tesseract, etc.) are detected automatically.
// Tables and spatial layout are reconstructed without additional options.
let markdown = convert?;
Inline Image Extraction
use ;
let config = new // 5MB max
.with_infer_dimensions
.with_filename_prefix;
let extraction = convert_with_inline_images?;
println!;
for in extraction.inline_images.iter.enumerate
Other Language Bindings
This is the core Rust library. For other languages:
- JavaScript/TypeScript: @html-to-markdown/node (NAPI-RS) or @html-to-markdown/wasm (WebAssembly)
- Python: html-to-markdown (PyO3)
- CLI: html-to-markdown-cli
Documentation
Performance
10-30x faster than pure Python/JavaScript implementations, delivering 150-210 MB/s throughput.
License
MIT