Module html

Expand description

HTML text extractor.

Strips HTML tags with a simple state-machine parser and preserves visible text content. Block-level elements (p, div, h1–h6, li, td, th, br) produce paragraph boundaries. <h1>–<h6> headings populate heading_path.

Security: no JavaScript execution, no external resource loading, no DOM construction. Pure text extraction only (RFC-015 §15).

Structs§

HtmlExtractor

Module html

Module html Copy item path

Structs§

Module html