Expand description
HTML text extractor.
Strips HTML tags with a simple state-machine parser and preserves
visible text content. Block-level elements (p, div, h1–h6,
li, td, th, br) produce paragraph boundaries.
<h1>–<h6> headings populate heading_path.
Security: no JavaScript execution, no external resource loading, no DOM construction. Pure text extraction only (RFC-015 §15).