html-to-markdown-rs
High-performance HTML to Markdown converter built with Rust.
This crate is the core engine compiled into the Python wheels, Ruby gem, Node.js NAPI bindings, WebAssembly package, and CLI, ensuring identical Markdown output across every language.
Fast, reliable HTML to Markdown conversion with full CommonMark compliance. Built with html5ever for correctness and a DOM-based filter for safe preprocessing.
Installation
[]
= "3.0"
Basic Usage
convert() returns a structured ConversionResult with the converted text, metadata, tables, and more:
use convert;
Error Handling
Conversion returns a Result<ConversionResult, ConversionError>. Inputs that look like binary data are rejected with
ConversionError::InvalidInput to prevent runaway allocations. Table colspan/rowspan values are also clamped
internally to keep output sizes bounded.
Configuration
Builder Pattern
use ;
let options = builder
.heading_style
.list_indent_width
.bullets
.autolinks
.wrap
.wrap_width
.build;
let result = convert?;
println!;
Struct Literal
use ;
let options = ConversionOptions ;
let result = convert?;
println!;
Preserving HTML Tags
The preserve_tags option allows you to keep specific HTML tags in their original form instead of converting them to Markdown:
use ;
let html = r#"
<p>Before table</p>
<table class="data">
<tr><th>Name</th><th>Value</th></tr>
<tr><td>Item 1</td><td>100</td></tr>
</table>
<p>After table</p>
"#;
let options = ConversionOptions ;
let result = convert?;
// result.content => "Before table\n\n<table class=\"data\">...</table>\n\nAfter table\n"
Web Scraping with Preprocessing
use ;
let mut options = default;
options.preprocessing.enabled = true;
options.preprocessing.preset = Aggressive;
options.preprocessing.remove_navigation = true;
options.preprocessing.remove_forms = true;
let result = convert?;
println!;
Metadata Extraction
Metadata is automatically included in the result. Configure which fields to extract via MetadataConfig:
use ;
let options = builder
.metadata_config
.build;
let result = convert?;
if let Some = &result.metadata
Image Extraction
use ;
let options = builder
.extract_images
.max_image_size // 5 MB max
.infer_dimensions
.build;
let result = convert?;
println!;
for img in &result.images
Table Extraction
Structured table data is always included in ConversionResult.tables:
use convert;
let html = r#"
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>Alice</td><td>30</td></tr>
<tr><td>Bob</td><td>25</td></tr>
</table>
"#;
let result = convert?;
println!;
for table in &result.tables
Custom Visitors
use ;
use ;
;
let options = builder
.visitor
.build;
let result = convert?;
println!;
Other Language Bindings
This is the core Rust library. For other languages:
- JavaScript/TypeScript: html-to-markdown-node (NAPI-RS) or html-to-markdown-wasm (WebAssembly)
- Python: html-to-markdown (PyO3)
- PHP: html-to-markdown (PIE + Composer helpers)
- Ruby: html-to-markdown (Magnus + rb-sys)
- CLI: html-to-markdown-cli
Documentation
Performance
10-30x faster than pure Python/JavaScript implementations, delivering 150-280 MB/s throughput.
License
MIT