pub fn classify_char(c: char) -> TokenKindExpand description
Classify a single Unicode scalar value into a TokenKind.
Classification is purely codepoint-based — no context is used. The rules are applied in priority order so that sub-ranges override their parent block (e.g. Thai digits are checked before the broader Thai block).
§Classification table
| Range / set | Kind |
|---|---|
| U+0E50–U+0E59 (Thai digits ๐–๙) | Number |
| U+0E00–U+0E7F (Thai block) | Thai |
0–9 (ASCII digits) | Number |
| U+FF10–U+FF19 (fullwidth digits) | Number |
A–Z, a–z (ASCII letters) | Latin |
| U+FF21–U+FF5A (fullwidth Latin) | Latin |
| Space, tab, newline, CR, NBSP, ideographic space | Whitespace |
| Major emoji blocks (U+1F300–U+1FAFF, U+2600–U+27BF, …) | Emoji |
ASCII punctuation (!–/, :–@, …) | Punctuation |
| U+2000–U+206F (Unicode general punctuation) | Punctuation |
| Everything else | Unknown |