Skip to main content

Module html

Module html 

Source
Available on crate feature html only.
Expand description

HTML extraction via html2md.

Lightweight, pure-Rust HTML→markdown conversion. The output quality is good enough for indexing/AI-grounding use cases — not as polished as Pandoc’s HTML reader (Pandoc preserves more edge- case structure) but in-process, dependency-light, and fast.

For consumers who want best-in-world HTML conversion quality, the pandoc backend also handles HTML and registers after this one in Engine::with_defaults(). Register PandocExtractor first if you want it to win for HTML files.

Structs§

Html2mdExtractor
HTML extractor backed by html2md. Construct via Html2mdExtractor::new — there’s no per-instance state.