Enum Token

Source

pub enum Token {
Show 16 variants
    ControlSeq(String),
    ActiveChar,
    LBrace,
    RBrace,
    MathShift,
    Alignment,
    Parameter,
    Superscript,
    Subscript,
    Star,
    LBracket,
    RBracket,
    Prime(usize),
    Whitespaces,
    Comment,
    Char(char),
}

Expand description

Token types for LaTeX lexical analysis.

This lexer recognizes LaTeX tokens based on character categories (catcode). It provides a simplified view where special characters and control sequences are identified, while preserving enough information for parsing.

Variants§

§

ControlSeq(String)

Control sequence: \command

catcode 0 (Escape): backslash triggers control sequence scanning
Matches: <letters> (control word) or <single-char> (control symbol)
Returns the command name without the backslash

§

ActiveChar

Active character: ~

catcode 13: Active Character
Treated as a command but without escape character
In LaTeX, ~ produces a non-breaking space

§

LBrace

Left brace: {

catcode 1: Begin Group
Used for grouping and delimiting arguments

§

RBrace

Right brace: }

catcode 2: End Group
Closes groups started by LBrace

§

MathShift

Dollar sign: $

catcode 3: Math Shift
Toggles inline math mode; $$ indicates display math

§

Alignment

Ampersand: &

catcode 4: Alignment Tab
Used in tables and alignment environments

§

Parameter

Hash/pound sign: #

catcode 6: Parameter
Used in macro definitions and arguments

§

Superscript

Caret: ^

catcode 7: Superscript
Indicates superscript in math mode

§

Subscript

Underscore: _

catcode 8: Subscript
Indicates subscript in math mode

§

Star

Star/Asterisk: *

catcode 12: Other
Used for starred command variants (e.g., \section*)
Must be checked immediately after command names

§

LBracket

Left bracket: [

catcode 12: Other
Often used for optional arguments

§

RBracket

Right bracket: ]

catcode 12: Other
Closes optional arguments

§

Prime(usize)

Prime mark(s): one or more ’ or U+2019

In math mode, represents derivative notation (f’ = f^\prime)
Multiple primes are common: f’‘, f’‘’
We store the count to simplify parser handling

§

Whitespaces

Whitespace: spaces, tabs, newlines, form feeds, non-breaking space

catcode 10: Spacer
Multiple consecutive whitespace characters are merged
Includes U+00A0 (non-breaking space) for copy-paste behavior

§

Comment

Comment: % to end of line

catcode 14: Comment
Lexer consumes everything from % to line end (inclusive)
Comments are discarded and do not produce tokens

§

Char(char)

Regular character: letters, digits, punctuation, Unicode (excluding invalid chars)

catcode 11: Letter (a-z, A-Z)
catcode 12: Other (digits, punctuation, etc.)
Matches any single printable character not covered by above patterns
Has lowest priority (1) to act as fallback

Note: Control characters (catcode 9, 15) are NOT matched by any pattern and will cause lexing errors automatically:

catcode 9 (Ignore): \x00-\x08, \x0B-\x1F (control chars except \t, \n, \f)
catcode 15 (Invalid): \x7F (DEL character)

Trait Implementations§

Source §

impl Clone for Token

Source §

fn clone(&self) -> Token

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for Token

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Display for Token

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl<'s> Logos<'s> for Token

Source §

type Error = ()

Error type returned by the lexer. This can be set using #[logos(error = MyError)]. Defaults to () if not set.

Source §

type Extras = ()

Associated type Extras for the particular lexer. This can be set using #[logos(extras = MyExtras)] and accessed inside callbacks.

Source §

type Source = str

Source type this token can be lexed from. This will default to str, unless one of the defined patterns explicitly uses non-unicode byte values or byte slices, in which case that implementation will use [u8].

Source §