Skip to main content

Module state_machine

Module state_machine 

Source
Expand description

Branchless state machine for token extraction. Branchless state machine for HTML token extraction.

The core idea: a 2D lookup table STATE_TABLE[state][byte_class] maps every (state, input-byte-class) pair to a (new_state, action) without any conditional branches. This eliminates branch misprediction costs that dominate traditional HTML tokenizers.

Structs§

Transition
A state transition entry: new state + action to perform.

Enums§

Action
Actions to perform during state transitions.
ByteClass
Byte classification — maps raw bytes to a small enum for table indexing.
State
Tokenizer states — models the HTML5 tokenizer states relevant for our structural-index-driven approach.

Constants§

BYTE_CLASS_COUNT
Number of byte classes — used for table dimensions.
STATE_COUNT
Number of states — used for table dimensions.

Statics§

STATE_TABLE
The master state transition table.