pub enum Token {
}Expand description
Token types for LaTeX lexical analysis.
This lexer recognizes LaTeX tokens based on character categories (catcode). It provides a simplified view where special characters and control sequences are identified, while preserving enough information for parsing.
Variants§
ControlSeq(String)
Control sequence: \command
- catcode 0 (Escape): backslash triggers control sequence scanning
- Matches: <letters> (control word) or <single-char> (control symbol)
- Returns the command name without the backslash
ActiveChar
Active character: ~
- catcode 13: Active Character
- Treated as a command but without escape character
- In LaTeX, ~ produces a non-breaking space
LBrace
Left brace: {
- catcode 1: Begin Group
- Used for grouping and delimiting arguments
RBrace
Right brace: }
- catcode 2: End Group
- Closes groups started by LBrace
MathShift
Dollar sign: $
- catcode 3: Math Shift
- Toggles inline math mode; $$ indicates display math
Alignment
Ampersand: &
- catcode 4: Alignment Tab
- Used in tables and alignment environments
Parameter
Hash/pound sign: #
- catcode 6: Parameter
- Used in macro definitions and arguments
Superscript
Caret: ^
- catcode 7: Superscript
- Indicates superscript in math mode
Subscript
Underscore: _
- catcode 8: Subscript
- Indicates subscript in math mode
Star
Star/Asterisk: *
- catcode 12: Other
- Used for starred command variants (e.g., \section*)
- Must be checked immediately after command names
LBracket
Left bracket: [
- catcode 12: Other
- Often used for optional arguments
RBracket
Right bracket: ]
- catcode 12: Other
- Closes optional arguments
Prime(usize)
Prime mark(s): one or more ’ or U+2019
- In math mode, represents derivative notation (f’ = f^\prime)
- Multiple primes are common: f’‘, f’‘’
- We store the count to simplify parser handling
Whitespaces
Whitespace: spaces, tabs, newlines, form feeds, non-breaking space
- catcode 10: Spacer
- Multiple consecutive whitespace characters are merged
- Includes U+00A0 (non-breaking space) for copy-paste behavior
Comment
Comment: % to end of line
- catcode 14: Comment
- Lexer consumes everything from % to line end (inclusive)
- Comments are discarded and do not produce tokens
Char(char)
Regular character: letters, digits, punctuation, Unicode (excluding invalid chars)
- catcode 11: Letter (a-z, A-Z)
- catcode 12: Other (digits, punctuation, etc.)
- Matches any single printable character not covered by above patterns
- Has lowest priority (1) to act as fallback
Note: Control characters (catcode 9, 15) are NOT matched by any pattern and will cause lexing errors automatically:
- catcode 9 (Ignore): \x00-\x08, \x0B-\x1F (control chars except \t, \n, \f)
- catcode 15 (Invalid): \x7F (DEL character)
Trait Implementations§
Source§impl<'s> Logos<'s> for Token
impl<'s> Logos<'s> for Token
Source§type Error = ()
type Error = ()
#[logos(error = MyError)]. Defaults to () if not set.Source§type Extras = ()
type Extras = ()
Extras for the particular lexer. This can be set using
#[logos(extras = MyExtras)] and accessed inside callbacks.Source§type Source = str
type Source = str
str,
unless one of the defined patterns explicitly uses non-unicode byte values
or byte slices, in which case that implementation will use [u8].Source§fn lex(lex: &mut Lexer<'s, Self>)
fn lex(lex: &mut Lexer<'s, Self>)
Lexer. The implementation for this function
is generated by the logos-derive crate.