Expand description
Branchless state machine for token extraction. Branchless state machine for HTML token extraction.
The core idea: a 2D lookup table STATE_TABLE[state][byte_class] maps
every (state, input-byte-class) pair to a (new_state, action) without
any conditional branches. This eliminates branch misprediction costs
that dominate traditional HTML tokenizers.
Structs§
- Transition
- A state transition entry: new state + action to perform.
Enums§
- Action
- Actions to perform during state transitions.
- Byte
Class - Byte classification — maps raw bytes to a small enum for table indexing.
- State
- Tokenizer states — models the HTML5 tokenizer states relevant for our structural-index-driven approach.
Constants§
- BYTE_
CLASS_ COUNT - Number of byte classes — used for table dimensions.
- STATE_
COUNT - Number of states — used for table dimensions.
Statics§
- STATE_
TABLE - The master state transition table.