Available on crate feature
html only.Expand description
HTML extraction via html2md.
Lightweight, pure-Rust HTML→markdown conversion. The output quality is good enough for indexing/AI-grounding use cases — not as polished as Pandoc’s HTML reader (Pandoc preserves more edge- case structure) but in-process, dependency-light, and fast.
For consumers who want best-in-world HTML conversion quality, the
pandoc backend also handles HTML and registers
after this one in Engine::with_defaults(). Register
PandocExtractor first if you want it to win for HTML files.
Structs§
- Html2md
Extractor - HTML extractor backed by
html2md. Construct viaHtml2mdExtractor::new— there’s no per-instance state.