Skip to main content

Module parser

Module parser 

Source
Expand description

SQL Parser – recursive-descent parser that converts a token stream into an AST.

The central type is Parser, which consumes tokens produced by the Tokenizer and builds a tree of Expression nodes covering the full SQL grammar: queries, DML, DDL, set operations, window functions, CTEs, and dialect-specific extensions for 30+ databases.

The simplest entry point is Parser::parse_sql, which tokenizes and parses a SQL string in one call.

§Static configuration maps

This module also exports several LazyLock<HashSet<TokenType>> constants (ported from Python sqlglot’s parser.py) that classify token types:

Structs§

Parser
Recursive-descent SQL parser that converts a token stream into an AST.
ParserConfig
Configuration for the SQL Parser.

Statics§

AGGREGATE_TYPE_TOKENS
AGGREGATE_TYPE_TOKENS: Tokens for aggregate function types (ClickHouse) Python: AGGREGATE_TYPE_TOKENS = {TokenType.AGGREGATEFUNCTION, …}
DB_CREATABLES
DB_CREATABLES: Object types that can be created with CREATE Python: DB_CREATABLES = {TokenType.DATABASE, TokenType.SCHEMA, …}
ENUM_TYPE_TOKENS
ENUM_TYPE_TOKENS: Tokens that represent enum types Python: ENUM_TYPE_TOKENS = {TokenType.DYNAMIC, TokenType.ENUM, …}
NESTED_TYPE_TOKENS
NESTED_TYPE_TOKENS: Tokens that can have nested type parameters Python: NESTED_TYPE_TOKENS = {TokenType.ARRAY, TokenType.LIST, …}
NO_PAREN_FUNCTIONS
NO_PAREN_FUNCTIONS: Functions that can be called without parentheses Maps TokenType to the function name for generation Python: NO_PAREN_FUNCTIONS = {TokenType.CURRENT_DATE: exp.CurrentDate, …}
NO_PAREN_FUNCTION_NAMES
NO_PAREN_FUNCTION_NAMES: String names that can be no-paren functions These are often tokenized as Var/Identifier instead of specific TokenTypes
RESERVED_TOKENS
RESERVED_TOKENS: Tokens that cannot be used as identifiers without quoting These are typically structural keywords that affect query parsing
SIGNED_TO_UNSIGNED_TYPE_TOKEN
SIGNED_TO_UNSIGNED_TYPE_TOKEN: Maps signed types to unsigned types Python: SIGNED_TO_UNSIGNED_TYPE_TOKEN = {TokenType.BIGINT: TokenType.UBIGINT, …}
STRUCT_TYPE_TOKENS
STRUCT_TYPE_TOKENS: Tokens that represent struct-like types Python: STRUCT_TYPE_TOKENS = {TokenType.FILE, TokenType.NESTED, TokenType.OBJECT, …}
SUBQUERY_PREDICATES
SUBQUERY_PREDICATES: Tokens that introduce subquery predicates Python: SUBQUERY_PREDICATES = {TokenType.ANY: exp.Any, …}
TYPE_TOKENS
TYPE_TOKENS: All tokens that represent data types Python: TYPE_TOKENS = {TokenType.BIT, TokenType.BOOLEAN, …}