Module extract

Expand description

Token extraction — stage 2 (scalar state machine). Token extraction — stage 2 of the two-stage tokenizer pipeline.

Uses the structural index from stage 1 to locate < and > boundaries, then scans the actual input bytes between them to extract tag names, attributes, comments, and text content. This hybrid approach combines SIMD-accelerated delimiter finding with scalar content parsing.