Skip to main content

Module html

Module html 

Source
Expand description

HTML text extractor (RFC-005 §5; RFC-044 §16.3 resource limits).

Strips HTML tags with a simple state-machine parser and preserves visible text content. Block-level elements produce paragraph boundaries. <h1><h6> headings populate heading_path.

Security: no JavaScript execution, no external resource loading, no DOM construction. Pure text extraction only (RFC-015 §15).

Structs§

HtmlExtractor