Parser

Struct Parser 

Source
pub struct Parser<'a> {
    pub mode: Mode,
    pub gullet: MacroExpander<'a>,
    pub settings: &'a Settings,
    pub leftright_depth: f64,
    pub next_token: Option<Token>,
    pub ctx: &'a KatexContext,
}
Expand description

The core parser for KaTeX, responsible for converting LaTeX mathematical expressions into an abstract syntax tree (AST) of parse nodes.

§Parsing Strategy

The parser employs a recursive descent approach with lookahead tokens:

  • Main parsing functions (e.g., parse, parse_expression) consume tokens sequentially
  • The lexer (MacroExpander) provides tokens on demand, supporting arbitrary position access
  • Mode switching between “math” and “text” contexts restricts available commands
  • Functions return ParseNode objects representing parsed structures

§LaTeX Command Handling

Supports comprehensive LaTeX math commands including:

  • Superscripts/subscripts (^, _)
  • Fractions (\frac, \over, \choose)
  • Delimiters (\left, \right)
  • Symbols and operators from the symbol table
  • Functions with argument parsing
  • Unicode superscript/subscript handling

§TeX Parsing Strategies

  • Token lookahead: Maintains a single lookahead token for efficient parsing
  • Mode enforcement: Validates command availability in current context
  • Infix operator rewriting: Converts \over, \choose into structured fractions
  • Ligature formation: Combines ASCII sequences in text mode
  • Error recovery: Provides detailed error messages with token locations

§Error Handling

Returns ParseError for syntax errors, undefined commands, and mode violations. Errors include token location information for precise error reporting.

§Cross-references

Fields§

§mode: Mode

Current parsing mode (Mode::Math or Mode::Text)

§gullet: MacroExpander<'a>

Token stream provider and macro expander

§settings: &'a Settings

Global parsing configuration

§leftright_depth: f64

Nesting depth for \left/\right pairs

§next_token: Option<Token>

Cached lookahead token

§ctx: &'a KatexContext

Shared context containing functions and symbols

Implementations§

Source§

impl<'a> Parser<'a>

Source

pub fn new( input: &'a str, settings: &'a Settings, ctx: &'a KatexContext, ) -> Self

Creates a new parser instance initialized with the provided input string, settings, and context. This is the primary constructor for parsing LaTeX mathematical expressions.

The parser starts in mathematical mode by default, with a fresh macro expander and no lookahead token cached. The input string is tokenized on-demand through the lexer component.

§Parameters
  • input - The LaTeX source string to parse (e.g., "x^2 + \\sqrt{y}")
  • settings - Configuration options affecting parsing behavior, such as color handling and global grouping
  • ctx - Shared context containing function definitions, symbol tables, and other parsing resources
§Return Value

Returns a fully initialized Parser ready to parse the input string.

§Error Handling

This constructor cannot fail, but subsequent parsing operations may return ParseError for invalid input or configuration issues.

§Cross-references
Source

pub fn expect(&mut self, text: &str, consume: bool) -> Result<(), ParseError>

Checks a result to make sure it has the right type, and throws an appropriate error otherwise.

Source

pub fn consume(&mut self)

Consumes the current lookahead token, advancing the parser state.

This method discards the cached lookahead token (if any) and marks it as processed. The next call to fetch will retrieve a new token from the input stream. This is essential for progressing through the token sequence during parsing.

Source

pub fn fetch(&mut self) -> Result<&Token, ParseError>

Retrieves the current lookahead token, fetching a new one if necessary.

This method implements the parser’s lookahead mechanism. If a token is already cached in the lookahead buffer, it returns that token. Otherwise, it requests the next token from the macro expander and caches it for future use.

§Return Value

Returns a reference to the current Token if available, or an error if tokenization fails (e.g., due to macro expansion errors or end of input).

§Behavior
  • Returns cached token if next_token is Some
  • Fetches new token from MacroExpander if cache is empty
  • The returned token remains cached until consume is called
  • Multiple calls without consuming return the same token
§Error Handling

Returns ParseError for:

  • Macro expansion failures
  • Lexer errors during tokenization
  • Unexpected end of input in certain contexts
§Cross-references
  • consume - Consumes the current lookahead token
  • Token - Token structure
  • MacroExpander - Token source
Source

pub const fn switch_mode(&mut self, new_mode: Mode)

Changes the parser’s current parsing mode, affecting available commands and behavior.

LaTeX has two primary modes: mathematical mode for equations and symbols, and text mode for regular text content. This method switches between them, updating both the parser’s internal state and the macro expander’s mode.

§Parameters
§Mode Differences

Math Mode (Mode::Math):

  • Allows mathematical symbols, operators, and commands
  • Spaces are ignored between tokens
  • Superscripts/subscripts are permitted
  • Functions like \sqrt, \frac are available

Text Mode (Mode::Text):

  • Supports text formatting and regular characters
  • Spaces are preserved
  • Limited mathematical commands (mainly \text and similar)
  • Enables ligature formation for typography
§Cross-references
Source

pub fn parse(&mut self) -> Result<Vec<ParseNode>, ParseError>

Parses the entire input string into an abstract syntax tree (AST).

This is the primary entry point for parsing LaTeX mathematical expressions. It processes the complete input from start to finish, handling macro expansion, expression parsing, and AST construction. The result is a vector of parse nodes wrapped in an OrdGroup to match KaTeX’s top-level structure.

§Processing Steps
  1. Group Setup: Creates a namespace group for the expression (unless global_group is enabled)
  2. Color Handling: Applies \color behavior settings
  3. Expression Parsing: Calls parse_expression to parse the content
  4. Validation: Ensures the entire input is consumed (ends with EOF)
  5. Cleanup: Closes any open groups and wraps result in OrdGroup
§Return Value

Returns a vector of ParseNode representing the AST on success, or a ParseError if parsing fails at any stage. The vector is typically wrapped in an OrdGroup for top-level expressions.

§Error Handling

Common error scenarios:

  • Syntax errors in LaTeX commands or expressions
  • Unmatched delimiters (\left without \right)
  • Undefined macros or functions
  • Mode violations (e.g., math commands in text mode)
  • Unexpected end of input
§Examples

Basic mathematical expression:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("E = mc^2", &settings, &ctx);
let ast = parser.parse().unwrap();
// ast is a vector containing an OrdGroup with the parsed expression

Complex expression with fractions:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("\\frac{a}{b} + \\sqrt{x}", &settings, &ctx);
match parser.parse() {
    Ok(nodes) => println!("Parsed successfully: {} nodes", nodes.len()),
    Err(e) => println!("Parse error: {}", e),
}
§Cross-references
  • parse_expression - Core expression parsing logic
  • ParseNode - AST node types
  • ParseError - Error types
  • Settings::global_group - Affects group creation behavior
Source

pub fn parse_expression( &mut self, break_on_infix: bool, break_on_token_text: Option<&BreakToken>, ) -> Result<Vec<ParseNode>, ParseError>

Parses a sequence of atoms into an expression list.

An expression in LaTeX parsing context is a sequence of atomic elements (symbols, functions, groups) that form a mathematical or textual unit. This method continues parsing until it encounters an end condition or reaches the end of input.

§Parameters
  • break_on_infix - If true, stops parsing when encountering infix operators (like \over, \choose) to allow higher-precedence functions to handle them. Used for operator precedence in nested expressions.

  • break_on_token_text - Optional token text that terminates the expression. Common terminators include "}", "\endgroup", "\end", "\right", "&".

§Return Value

Returns a vector of ParseNode representing the parsed atoms. The result may be empty if no atoms are found before an end condition.

§Parsing Behavior
  • Space Handling: Consumes spaces in math mode, preserves in text mode
  • End Conditions: Stops at EOF, end-of-expression tokens, or break tokens
  • Infix Detection: Checks for infix operators when break_on_infix is true
  • Atom Parsing: Calls parse_atom for each atomic element
  • Ligature Formation: Applies typographic ligatures in text mode
  • Infix Rewriting: Converts infix operators to structured forms
§Infix Operator Handling

When break_on_infix is false, infix operators like \over are rewritten into Genfrac nodes with appropriate delimiters:

  • \over → fraction with bar line
  • \choose → fraction with parentheses
  • \above → fraction with bar line (size parsing not yet implemented)
§Examples

Basic expression parsing:

use katex::parser::Parser;
use katex::{KatexContext, Settings};

let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a + b \\cdot c", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Returns vector of MathOrd("a"), MathOrd("+"), MathOrd("b"), etc.

Breaking on infix operators:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a \\over b + c", &settings, &ctx);
let expr = parser.parse_expression(true, None).unwrap();
// Stops at \over, allowing parent function to handle precedence

Parsing until specific token:

use katex::parser::Parser;
use katex::{KatexContext, Settings, types::BreakToken};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x + y }", &settings, &ctx);
let expr = parser
    .parse_expression(false, Some(&BreakToken::RightBrace))
    .unwrap();
// Stops at "}", leaving it for outer parser
§Cross-references
  • parse_atom - Parses individual atomic elements
  • handle_infix_nodes - Rewrites infix operators
  • form_ligatures - Applies text ligatures
  • BreakToken - Expression termination tokens
Source

pub fn consume_spaces(&mut self) -> Result<(), ParseError>

Consumes consecutive space tokens, advancing to the next non-space token.

In LaTeX mathematical mode, spaces between tokens are typically ignored and don’t affect the output. This method efficiently skips over any whitespace tokens, positioning the parser at the next meaningful token.

§Behavior
  • Repeatedly fetches and consumes tokens that are TokenType::Space
  • Stops when a non-space token is encountered (becomes the new lookahead)
  • Does nothing if the current lookahead is already non-space
  • Safe to call at any point during parsing
§Return Value

Returns Ok(()) on success, or ParseError if token fetching fails (e.g., due to macro expansion errors).

§Examples

Basic space consumption:

use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a   +   b", &settings, &ctx);
let token = parser.fetch().unwrap();
assert_eq!(token.text, "a");
parser.consume(); // consume "a"
parser.consume_spaces().unwrap(); // skip spaces
let next = parser.fetch().unwrap();
assert_eq!(next.text, "+");

In expression parsing (automatic):

// Spaces are automatically consumed in math mode
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x   ^   2", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Spaces between tokens are ignored
§Cross-references
  • fetch - Retrieves the current lookahead token
  • consume - Consumes a single token
  • TokenType::Space - Space token type
  • parse_expression - Uses this method in math mode
Source

pub fn parse_size_group( &mut self, optional: bool, ) -> Result<Option<ParseNodeSize>, ParseError>

Parse a size specification.

Source

pub fn format_unsupported_cmd(&self, text: &str) -> ParseNodeColor

Convert textual input of an unsupported command into a color node containing a text node

Source

pub fn parse_function( &mut self, break_on_token_text: Option<&BreakToken>, name: Option<&str>, ) -> Result<Option<ParseNode>, ParseError>

Parse a function if present at current token

Source

pub fn subparse( &mut self, tokens: Vec<Token>, ) -> Result<Vec<ParseNode>, ParseError>

Parses a separate sequence of tokens as a separate job. Tokens should be specified in reverse order, as in a MacroDefinition.

Source

pub fn handle_sup_subscript( &mut self, name: &str, ) -> Result<ParseNode, ParseError>

Handle a subscript or superscript with nice errors.

Source

pub fn call_function( &mut self, name: &str, args: Vec<ParseNode>, opt_args: Vec<Option<ParseNode>>, token: Option<&Token>, break_on_token_text: Option<&BreakToken>, ) -> Result<ParseNode, ParseError>

Call a function handler with a suitable context and arguments.

Source

pub fn parse_arguments( &mut self, func: &str, func_data: &dyn Spec, ) -> Result<(Vec<ParseNode>, Vec<Option<ParseNode>>), ParseError>

Parses the arguments of a function or environment

Auto Trait Implementations§

§

impl<'a> Freeze for Parser<'a>

§

impl<'a> !RefUnwindSafe for Parser<'a>

§

impl<'a> !Send for Parser<'a>

§

impl<'a> !Sync for Parser<'a>

§

impl<'a> Unpin for Parser<'a>

§

impl<'a> !UnwindSafe for Parser<'a>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.