Expand description
Token extraction — stage 2 (scalar state machine). Token extraction — stage 2 of the two-stage tokenizer pipeline.
Uses the structural index from stage 1 to locate < and > boundaries,
then scans the actual input bytes between them to extract tag names,
attributes, comments, and text content. This hybrid approach combines
SIMD-accelerated delimiter finding with scalar content parsing.
Functions§
- extract_
tokens - Extract tokens from pre-indexed UTF-8 input.
- extract_
tokens_ bytes - Extract tokens from pre-indexed raw bytes after UTF-8 validation.