Struct Parser

Source

pub struct Parser<'a> {
    pub mode: Mode,
    pub gullet: MacroExpander<'a>,
    pub settings: &'a Settings,
    pub leftright_depth: f64,
    pub next_token: Option<Token>,
    pub ctx: &'a KatexContext,
}

Expand description

The core parser for KaTeX, responsible for converting LaTeX mathematical expressions into an abstract syntax tree (AST) of parse nodes.

§Parsing Strategy

The parser employs a recursive descent approach with lookahead tokens:

Main parsing functions (e.g., parse, parse_expression) consume tokens sequentially
The lexer (MacroExpander) provides tokens on demand, supporting arbitrary position access
Mode switching between “math” and “text” contexts restricts available commands
Functions return ParseNode objects representing parsed structures

§LaTeX Command Handling

Supports comprehensive LaTeX math commands including:

Superscripts/subscripts (^, _)
Fractions (\frac, \over, \choose)
Delimiters (\left, \right)
Symbols and operators from the symbol table
Functions with argument parsing
Unicode superscript/subscript handling

§TeX Parsing Strategies

Token lookahead: Maintains a single lookahead token for efficient parsing
Mode enforcement: Validates command availability in current context
Infix operator rewriting: Converts \over, \choose into structured fractions
Ligature formation: Combines ASCII sequences in text mode
Error recovery: Provides detailed error messages with token locations

§Error Handling

Returns ParseError for syntax errors, undefined commands, and mode violations. Errors include token location information for precise error reporting.

§Cross-references

parse_node - AST node types
MacroExpander - Token stream and macro handling
Mode - Parsing context modes
ParseError - Error types

Fields§

§mode: Mode

Current parsing mode (Mode::Math or Mode::Text)

§gullet: MacroExpander<'a>

Token stream provider and macro expander

§settings: &'a Settings

Global parsing configuration

§leftright_depth: f64

Nesting depth for \left/\right pairs

§next_token: Option<Token>

Cached lookahead token

§ctx: &'a KatexContext

Shared context containing functions and symbols

Implementations§

Source §

impl<'a> Parser<'a>

Source

pub fn new( input: &'a str, settings: &'a Settings, ctx: &'a KatexContext, ) -> Self

Creates a new parser instance initialized with the provided input string, settings, and context. This is the primary constructor for parsing LaTeX mathematical expressions.

The parser starts in mathematical mode by default, with a fresh macro expander and no lookahead token cached. The input string is tokenized on-demand through the lexer component.

§Parameters

input - The LaTeX source string to parse (e.g., "x^2 + \\sqrt{y}")
settings - Configuration options affecting parsing behavior, such as color handling and global grouping
ctx - Shared context containing function definitions, symbol tables, and other parsing resources

§Return Value

Returns a fully initialized Parser ready to parse the input string.

§Error Handling

This constructor cannot fail, but subsequent parsing operations may return ParseError for invalid input or configuration issues.

§Cross-references

Settings - Configuration structure
KatexContext - Shared parsing context

Source

pub fn expect(&mut self, text: &str, consume: bool) -> Result<(), ParseError>

Checks a result to make sure it has the right type, and throws an appropriate error otherwise.

Source

pub fn consume(&mut self)

Consumes the current lookahead token, advancing the parser state.

This method discards the cached lookahead token (if any) and marks it as processed. The next call to fetch will retrieve a new token from the input stream. This is essential for progressing through the token sequence during parsing.

Source

pub fn fetch(&mut self) -> Result<&Token, ParseError>

Retrieves the current lookahead token, fetching a new one if necessary.

This method implements the parser’s lookahead mechanism. If a token is already cached in the lookahead buffer, it returns that token. Otherwise, it requests the next token from the macro expander and caches it for future use.

§Return Value

Returns a reference to the current Token if available, or an error if tokenization fails (e.g., due to macro expansion errors or end of input).

§Behavior

Returns cached token if next_token is Some
Fetches new token from MacroExpander if cache is empty
The returned token remains cached until consume is called
Multiple calls without consuming return the same token

§Error Handling

Returns ParseError for:

Macro expansion failures
Lexer errors during tokenization
Unexpected end of input in certain contexts

§Cross-references

consume - Consumes the current lookahead token
Token - Token structure
MacroExpander - Token source

Source

pub const fn switch_mode(&mut self, new_mode: Mode)

Changes the parser’s current parsing mode, affecting available commands and behavior.

LaTeX has two primary modes: mathematical mode for equations and symbols, and text mode for regular text content. This method switches between them, updating both the parser’s internal state and the macro expander’s mode.

§Parameters

new_mode - The target mode to switch to (Mode::Math or Mode::Text)

§Mode Differences

Math Mode (Mode::Math):

Allows mathematical symbols, operators, and commands
Spaces are ignored between tokens
Superscripts/subscripts are permitted
Functions like \sqrt, \frac are available

Text Mode (Mode::Text):

Supports text formatting and regular characters
Spaces are preserved
Limited mathematical commands (mainly \text and similar)
Enables ligature formation for typography

§Cross-references

Mode - Enumeration of parsing modes
parse_expression - Expression parsing affected by mode
MacroExpander::switch_mode - Underlying mode switching

Source

pub fn parse(&mut self) -> Result<Vec<ParseNode>, ParseError>

Parses the entire input string into an abstract syntax tree (AST).

This is the primary entry point for parsing LaTeX mathematical expressions. It processes the complete input from start to finish, handling macro expansion, expression parsing, and AST construction. The result is a vector of parse nodes wrapped in an OrdGroup to match KaTeX’s top-level structure.

§Processing Steps

Group Setup: Creates a namespace group for the expression (unless global_group is enabled)
Color Handling: Applies \color behavior settings
Expression Parsing: Calls parse_expression to parse the content
Validation: Ensures the entire input is consumed (ends with EOF)
Cleanup: Closes any open groups and wraps result in OrdGroup

§Return Value

Returns a vector of ParseNode representing the AST on success, or a ParseError if parsing fails at any stage. The vector is typically wrapped in an OrdGroup for top-level expressions.

§Error Handling

Common error scenarios:

Syntax errors in LaTeX commands or expressions
Unmatched delimiters (\left without \right)
Undefined macros or functions
Mode violations (e.g., math commands in text mode)
Unexpected end of input

§Examples

Basic mathematical expression:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("E = mc^2", &settings, &ctx);
let ast = parser.parse().unwrap();
// ast is a vector containing an OrdGroup with the parsed expression

Complex expression with fractions:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("\\frac{a}{b} + \\sqrt{x}", &settings, &ctx);
match parser.parse() {
    Ok(nodes) => println!("Parsed successfully: {} nodes", nodes.len()),
    Err(e) => println!("Parse error: {}", e),
}

§Cross-references

parse_expression - Core expression parsing logic
ParseNode - AST node types
ParseError - Error types
Settings::global_group - Affects group creation behavior

Source

pub fn parse_expression( &mut self, break_on_infix: bool, break_on_token_text: Option<&BreakToken>, ) -> Result<Vec<ParseNode>, ParseError>

Parses a sequence of atoms into an expression list.

An expression in LaTeX parsing context is a sequence of atomic elements (symbols, functions, groups) that form a mathematical or textual unit. This method continues parsing until it encounters an end condition or reaches the end of input.

§Parameters

break_on_infix - If true, stops parsing when encountering infix operators (like \over, \choose) to allow higher-precedence functions to handle them. Used for operator precedence in nested expressions.
break_on_token_text - Optional token text that terminates the expression. Common terminators include "}", "\endgroup", "\end", "\right", "&".

§Return Value

Returns a vector of ParseNode representing the parsed atoms. The result may be empty if no atoms are found before an end condition.

§Parsing Behavior

Space Handling: Consumes spaces in math mode, preserves in text mode
End Conditions: Stops at EOF, end-of-expression tokens, or break tokens
Infix Detection: Checks for infix operators when break_on_infix is true
Atom Parsing: Calls parse_atom for each atomic element
Ligature Formation: Applies typographic ligatures in text mode
Infix Rewriting: Converts infix operators to structured forms

§Infix Operator Handling

When break_on_infix is false, infix operators like \over are rewritten into Genfrac nodes with appropriate delimiters:

\over → fraction with bar line
\choose → fraction with parentheses
\above → fraction with bar line (size parsing not yet implemented)

§Examples

Basic expression parsing:

use katex::parser::Parser;
use katex::{KatexContext, Settings};

let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a + b \\cdot c", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Returns vector of MathOrd("a"), MathOrd("+"), MathOrd("b"), etc.

Breaking on infix operators:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a \\over b + c", &settings, &ctx);
let expr = parser.parse_expression(true, None).unwrap();
// Stops at \over, allowing parent function to handle precedence

Parsing until specific token:

use katex::parser::Parser;
use katex::{KatexContext, Settings, types::BreakToken};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x + y }", &settings, &ctx);
let expr = parser
    .parse_expression(false, Some(&BreakToken::RightBrace))
    .unwrap();
// Stops at "}", leaving it for outer parser

§Cross-references

parse_atom - Parses individual atomic elements
handle_infix_nodes - Rewrites infix operators
form_ligatures - Applies text ligatures
BreakToken - Expression termination tokens

Source

pub fn consume_spaces(&mut self) -> Result<(), ParseError>

Consumes consecutive space tokens, advancing to the next non-space token.

In LaTeX mathematical mode, spaces between tokens are typically ignored and don’t affect the output. This method efficiently skips over any whitespace tokens, positioning the parser at the next meaningful token.

§Behavior

Repeatedly fetches and consumes tokens that are TokenType::Space
Stops when a non-space token is encountered (becomes the new lookahead)
Does nothing if the current lookahead is already non-space
Safe to call at any point during parsing

§Return Value

Returns Ok(()) on success, or ParseError if token fetching fails (e.g., due to macro expansion errors).

§Examples

Basic space consumption:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a   +   b", &settings, &ctx);
let token = parser.fetch().unwrap();
assert_eq!(token.text, "a");
parser.consume(); // consume "a"
parser.consume_spaces().unwrap(); // skip spaces
let next = parser.fetch().unwrap();
assert_eq!(next.text, "+");

In expression parsing (automatic):

// Spaces are automatically consumed in math mode
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x   ^   2", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Spaces between tokens are ignored