Skip to main content

classify_char

Function classify_char 

Source
pub fn classify_char(c: char) -> TokenKind
Expand description

Classify a single Unicode scalar value into a TokenKind.

Classification is purely codepoint-based — no context is used. The rules are applied in priority order so that sub-ranges override their parent block (e.g. Thai digits are checked before the broader Thai block).

§Classification table

Range / setKind
U+0E50–U+0E59 (Thai digits ๐–๙)Number
U+0E00–U+0E7F (Thai block)Thai
09 (ASCII digits)Number
U+FF10–U+FF19 (fullwidth digits)Number
AZ, az (ASCII letters)Latin
U+FF21–U+FF5A (fullwidth Latin)Latin
Space, tab, newline, CR, NBSP, ideographic spaceWhitespace
Major emoji blocks (U+1F300–U+1FAFF, U+2600–U+27BF, …)Emoji
ASCII punctuation (!/, :@, …)Punctuation
U+2000–U+206F (Unicode general punctuation)Punctuation
Everything elseUnknown