Function classify_char

Source

pub fn classify_char(c: char) -> TokenKind

Expand description

Classify a single Unicode scalar value into a TokenKind.

Classification is purely codepoint-based — no context is used. The rules are applied in priority order so that sub-ranges override their parent block (e.g. Thai digits are checked before the broader Thai block).

§Classification table

Range / set	Kind
U+0E50–U+0E59 (Thai digits ๐–๙)	`Number`
U+0E00–U+0E7F (Thai block)	`Thai`
`0`–`9` (ASCII digits)	`Number`
U+FF10–U+FF19 (fullwidth digits)	`Number`
`A`–`Z`, `a`–`z` (ASCII letters)	`Latin`
U+FF21–U+FF5A (fullwidth Latin)	`Latin`
Space, tab, newline, CR, NBSP, ideographic space	`Whitespace`
Major emoji blocks (U+1F300–U+1FAFF, U+2600–U+27BF, …)	`Emoji`
ASCII punctuation (`!`–`/`, `:`–`@`, …)	`Punctuation`
U+2000–U+206F (Unicode general punctuation)	`Punctuation`
Everything else	`Unknown`

classify_char

Function classify_char Copy item path

§Classification table

Function classify_char