pub struct Parser<'a> {
pub mode: Mode,
pub gullet: MacroExpander<'a>,
pub settings: &'a Settings,
pub leftright_depth: f64,
pub next_token: Option<Token>,
pub ctx: &'a KatexContext,
}
Expand description
The core parser for KaTeX, responsible for converting LaTeX mathematical expressions into an abstract syntax tree (AST) of parse nodes.
§Parsing Strategy
The parser employs a recursive descent approach with lookahead tokens:
- Main parsing functions (e.g., parse, parse_expression) consume tokens sequentially
- The lexer (
MacroExpander
) provides tokens on demand, supporting arbitrary position access - Mode switching between “math” and “text” contexts restricts available commands
- Functions return
ParseNode
objects representing parsed structures
§LaTeX Command Handling
Supports comprehensive LaTeX math commands including:
- Superscripts/subscripts (
^
,_
) - Fractions (
\frac
,\over
,\choose
) - Delimiters (
\left
,\right
) - Symbols and operators from the symbol table
- Functions with argument parsing
- Unicode superscript/subscript handling
§TeX Parsing Strategies
- Token lookahead: Maintains a single lookahead token for efficient parsing
- Mode enforcement: Validates command availability in current context
- Infix operator rewriting: Converts
\over
,\choose
into structured fractions - Ligature formation: Combines ASCII sequences in text mode
- Error recovery: Provides detailed error messages with token locations
§Error Handling
Returns ParseError
for syntax errors, undefined commands, and mode
violations. Errors include token location information for precise error
reporting.
§Cross-references
parse_node
- AST node typesMacroExpander
- Token stream and macro handlingMode
- Parsing context modesParseError
- Error types
Fields§
§mode: Mode
Current parsing mode (Mode::Math
or Mode::Text
)
gullet: MacroExpander<'a>
Token stream provider and macro expander
settings: &'a Settings
Global parsing configuration
leftright_depth: f64
Nesting depth for \left
/\right
pairs
next_token: Option<Token>
Cached lookahead token
ctx: &'a KatexContext
Shared context containing functions and symbols
Implementations§
Source§impl<'a> Parser<'a>
impl<'a> Parser<'a>
Sourcepub fn new(
input: &'a str,
settings: &'a Settings,
ctx: &'a KatexContext,
) -> Self
pub fn new( input: &'a str, settings: &'a Settings, ctx: &'a KatexContext, ) -> Self
Creates a new parser instance initialized with the provided input string, settings, and context. This is the primary constructor for parsing LaTeX mathematical expressions.
The parser starts in mathematical mode by default, with a fresh macro expander and no lookahead token cached. The input string is tokenized on-demand through the lexer component.
§Parameters
input
- The LaTeX source string to parse (e.g.,"x^2 + \\sqrt{y}"
)settings
- Configuration options affecting parsing behavior, such as color handling and global groupingctx
- Shared context containing function definitions, symbol tables, and other parsing resources
§Return Value
Returns a fully initialized Parser
ready to parse the input string.
§Error Handling
This constructor cannot fail, but subsequent parsing operations may
return ParseError
for invalid input or configuration issues.
§Cross-references
Settings
- Configuration structureKatexContext
- Shared parsing context
Sourcepub fn expect(&mut self, text: &str, consume: bool) -> Result<(), ParseError>
pub fn expect(&mut self, text: &str, consume: bool) -> Result<(), ParseError>
Checks a result to make sure it has the right type, and throws an appropriate error otherwise.
Sourcepub fn consume(&mut self)
pub fn consume(&mut self)
Consumes the current lookahead token, advancing the parser state.
This method discards the cached lookahead token (if any) and marks it as processed. The next call to fetch will retrieve a new token from the input stream. This is essential for progressing through the token sequence during parsing.
Sourcepub fn fetch(&mut self) -> Result<&Token, ParseError>
pub fn fetch(&mut self) -> Result<&Token, ParseError>
Retrieves the current lookahead token, fetching a new one if necessary.
This method implements the parser’s lookahead mechanism. If a token is already cached in the lookahead buffer, it returns that token. Otherwise, it requests the next token from the macro expander and caches it for future use.
§Return Value
Returns a reference to the current Token
if available, or an error
if tokenization fails (e.g., due to macro expansion errors or end of
input).
§Behavior
- Returns cached token if
next_token
isSome
- Fetches new token from
MacroExpander
if cache is empty - The returned token remains cached until consume is called
- Multiple calls without consuming return the same token
§Error Handling
Returns ParseError
for:
- Macro expansion failures
- Lexer errors during tokenization
- Unexpected end of input in certain contexts
§Cross-references
- consume - Consumes the current lookahead token
Token
- Token structureMacroExpander
- Token source
Sourcepub const fn switch_mode(&mut self, new_mode: Mode)
pub const fn switch_mode(&mut self, new_mode: Mode)
Changes the parser’s current parsing mode, affecting available commands and behavior.
LaTeX has two primary modes: mathematical mode for equations and symbols, and text mode for regular text content. This method switches between them, updating both the parser’s internal state and the macro expander’s mode.
§Parameters
new_mode
- The target mode to switch to (Mode::Math
orMode::Text
)
§Mode Differences
Math Mode (Mode::Math
):
- Allows mathematical symbols, operators, and commands
- Spaces are ignored between tokens
- Superscripts/subscripts are permitted
- Functions like
\sqrt
,\frac
are available
Text Mode (Mode::Text
):
- Supports text formatting and regular characters
- Spaces are preserved
- Limited mathematical commands (mainly
\text
and similar) - Enables ligature formation for typography
§Cross-references
Mode
- Enumeration of parsing modes- parse_expression - Expression parsing affected by mode
MacroExpander::switch_mode
- Underlying mode switching
Sourcepub fn parse(&mut self) -> Result<Vec<ParseNode>, ParseError>
pub fn parse(&mut self) -> Result<Vec<ParseNode>, ParseError>
Parses the entire input string into an abstract syntax tree (AST).
This is the primary entry point for parsing LaTeX mathematical expressions. It processes the complete input from start to finish, handling macro expansion, expression parsing, and AST construction. The result is a vector of parse nodes wrapped in an OrdGroup to match KaTeX’s top-level structure.
§Processing Steps
- Group Setup: Creates a namespace group for the expression (unless
global_group
is enabled) - Color Handling: Applies
\color
behavior settings - Expression Parsing: Calls parse_expression to parse the content
- Validation: Ensures the entire input is consumed (ends with EOF)
- Cleanup: Closes any open groups and wraps result in OrdGroup
§Return Value
Returns a vector of ParseNode
representing the AST on success, or a
ParseError
if parsing fails at any stage. The vector is
typically wrapped in an OrdGroup for top-level expressions.
§Error Handling
Common error scenarios:
- Syntax errors in LaTeX commands or expressions
- Unmatched delimiters (
\left
without\right
) - Undefined macros or functions
- Mode violations (e.g., math commands in text mode)
- Unexpected end of input
§Examples
Basic mathematical expression:
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("E = mc^2", &settings, &ctx);
let ast = parser.parse().unwrap();
// ast is a vector containing an OrdGroup with the parsed expression
Complex expression with fractions:
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("\\frac{a}{b} + \\sqrt{x}", &settings, &ctx);
match parser.parse() {
Ok(nodes) => println!("Parsed successfully: {} nodes", nodes.len()),
Err(e) => println!("Parse error: {}", e),
}
§Cross-references
- parse_expression - Core expression parsing logic
ParseNode
- AST node typesParseError
- Error typesSettings::global_group
- Affects group creation behavior
Sourcepub fn parse_expression(
&mut self,
break_on_infix: bool,
break_on_token_text: Option<&BreakToken>,
) -> Result<Vec<ParseNode>, ParseError>
pub fn parse_expression( &mut self, break_on_infix: bool, break_on_token_text: Option<&BreakToken>, ) -> Result<Vec<ParseNode>, ParseError>
Parses a sequence of atoms into an expression list.
An expression in LaTeX parsing context is a sequence of atomic elements (symbols, functions, groups) that form a mathematical or textual unit. This method continues parsing until it encounters an end condition or reaches the end of input.
§Parameters
-
break_on_infix
- Iftrue
, stops parsing when encountering infix operators (like\over
,\choose
) to allow higher-precedence functions to handle them. Used for operator precedence in nested expressions. -
break_on_token_text
- Optional token text that terminates the expression. Common terminators include"}"
,"\endgroup"
,"\end"
,"\right"
,"&"
.
§Return Value
Returns a vector of ParseNode
representing the parsed atoms. The
result may be empty if no atoms are found before an end condition.
§Parsing Behavior
- Space Handling: Consumes spaces in math mode, preserves in text mode
- End Conditions: Stops at EOF, end-of-expression tokens, or break tokens
- Infix Detection: Checks for infix operators when
break_on_infix
is true - Atom Parsing: Calls
parse_atom
for each atomic element - Ligature Formation: Applies typographic ligatures in text mode
- Infix Rewriting: Converts infix operators to structured forms
§Infix Operator Handling
When break_on_infix
is false
, infix operators like \over
are
rewritten into Genfrac
nodes with appropriate delimiters:
\over
→ fraction with bar line\choose
→ fraction with parentheses\above
→ fraction with bar line (size parsing not yet implemented)
§Examples
Basic expression parsing:
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a + b \\cdot c", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Returns vector of MathOrd("a"), MathOrd("+"), MathOrd("b"), etc.
Breaking on infix operators:
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a \\over b + c", &settings, &ctx);
let expr = parser.parse_expression(true, None).unwrap();
// Stops at \over, allowing parent function to handle precedence
Parsing until specific token:
use katex::parser::Parser;
use katex::{KatexContext, Settings, types::BreakToken};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x + y }", &settings, &ctx);
let expr = parser
.parse_expression(false, Some(&BreakToken::RightBrace))
.unwrap();
// Stops at "}", leaving it for outer parser
§Cross-references
- parse_atom - Parses individual atomic elements
- handle_infix_nodes - Rewrites infix operators
- form_ligatures - Applies text ligatures
BreakToken
- Expression termination tokens
Sourcepub fn consume_spaces(&mut self) -> Result<(), ParseError>
pub fn consume_spaces(&mut self) -> Result<(), ParseError>
Consumes consecutive space tokens, advancing to the next non-space token.
In LaTeX mathematical mode, spaces between tokens are typically ignored and don’t affect the output. This method efficiently skips over any whitespace tokens, positioning the parser at the next meaningful token.
§Behavior
- Repeatedly fetches and consumes tokens that are TokenType::Space
- Stops when a non-space token is encountered (becomes the new lookahead)
- Does nothing if the current lookahead is already non-space
- Safe to call at any point during parsing
§Return Value
Returns Ok(())
on success, or ParseError
if token fetching fails
(e.g., due to macro expansion errors).
§Examples
Basic space consumption:
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("a + b", &settings, &ctx);
let token = parser.fetch().unwrap();
assert_eq!(token.text, "a");
parser.consume(); // consume "a"
parser.consume_spaces().unwrap(); // skip spaces
let next = parser.fetch().unwrap();
assert_eq!(next.text, "+");
In expression parsing (automatic):
// Spaces are automatically consumed in math mode
use katex::parser::Parser;
use katex::{KatexContext, Settings};
let settings = Settings::default();
let ctx = KatexContext::default();
let mut parser = Parser::new("x ^ 2", &settings, &ctx);
let expr = parser.parse_expression(false, None).unwrap();
// Spaces between tokens are ignored
§Cross-references
- fetch - Retrieves the current lookahead token
- consume - Consumes a single token
- TokenType::Space - Space token type
- parse_expression - Uses this method in math mode
Sourcepub fn parse_size_group(
&mut self,
optional: bool,
) -> Result<Option<ParseNodeSize>, ParseError>
pub fn parse_size_group( &mut self, optional: bool, ) -> Result<Option<ParseNodeSize>, ParseError>
Parse a size specification.
Sourcepub fn format_unsupported_cmd(&self, text: &str) -> ParseNodeColor
pub fn format_unsupported_cmd(&self, text: &str) -> ParseNodeColor
Convert textual input of an unsupported command into a color node containing a text node
Sourcepub fn parse_function(
&mut self,
break_on_token_text: Option<&BreakToken>,
name: Option<&str>,
) -> Result<Option<ParseNode>, ParseError>
pub fn parse_function( &mut self, break_on_token_text: Option<&BreakToken>, name: Option<&str>, ) -> Result<Option<ParseNode>, ParseError>
Parse a function if present at current token
Sourcepub fn subparse(
&mut self,
tokens: Vec<Token>,
) -> Result<Vec<ParseNode>, ParseError>
pub fn subparse( &mut self, tokens: Vec<Token>, ) -> Result<Vec<ParseNode>, ParseError>
Parses a separate sequence of tokens as a separate job. Tokens should be specified in reverse order, as in a MacroDefinition.
Sourcepub fn handle_sup_subscript(
&mut self,
name: &str,
) -> Result<ParseNode, ParseError>
pub fn handle_sup_subscript( &mut self, name: &str, ) -> Result<ParseNode, ParseError>
Handle a subscript or superscript with nice errors.
Sourcepub fn call_function(
&mut self,
name: &str,
args: Vec<ParseNode>,
opt_args: Vec<Option<ParseNode>>,
token: Option<&Token>,
break_on_token_text: Option<&BreakToken>,
) -> Result<ParseNode, ParseError>
pub fn call_function( &mut self, name: &str, args: Vec<ParseNode>, opt_args: Vec<Option<ParseNode>>, token: Option<&Token>, break_on_token_text: Option<&BreakToken>, ) -> Result<ParseNode, ParseError>
Call a function handler with a suitable context and arguments.