Expand description
SQL Parser – recursive-descent parser that converts a token stream into an AST.
The central type is Parser, which consumes tokens produced by the
Tokenizer and builds a tree of Expression
nodes covering the full SQL grammar: queries, DML, DDL, set operations,
window functions, CTEs, and dialect-specific extensions for 30+ databases.
The simplest entry point is Parser::parse_sql, which tokenizes and
parses a SQL string in one call.
§Static configuration maps
This module also exports several LazyLock<HashSet<TokenType>> constants
(ported from Python sqlglot’s parser.py) that classify token types:
TYPE_TOKENS– all tokens that represent SQL data typesNESTED_TYPE_TOKENS– parametric types likeARRAY,MAP,STRUCTRESERVED_TOKENS– tokens that cannot be used as unquoted identifiersNO_PAREN_FUNCTIONS/NO_PAREN_FUNCTION_NAMES– zero-argument functions that may be written without parentheses (e.g.CURRENT_DATE)DB_CREATABLES– object kinds valid afterCREATE(TABLE, VIEW, etc.)SUBQUERY_PREDICATES– tokens introducing subquery predicates (ANY, ALL, EXISTS)
Structs§
- Parser
- Recursive-descent SQL parser that converts a token stream into an AST.
- Parser
Config - Configuration for the SQL
Parser.
Statics§
- AGGREGATE_
TYPE_ TOKENS - AGGREGATE_TYPE_TOKENS: Tokens for aggregate function types (ClickHouse) Python: AGGREGATE_TYPE_TOKENS = {TokenType.AGGREGATEFUNCTION, …}
- DB_
CREATABLES - DB_CREATABLES: Object types that can be created with CREATE Python: DB_CREATABLES = {TokenType.DATABASE, TokenType.SCHEMA, …}
- ENUM_
TYPE_ TOKENS - ENUM_TYPE_TOKENS: Tokens that represent enum types Python: ENUM_TYPE_TOKENS = {TokenType.DYNAMIC, TokenType.ENUM, …}
- NESTED_
TYPE_ TOKENS - NESTED_TYPE_TOKENS: Tokens that can have nested type parameters Python: NESTED_TYPE_TOKENS = {TokenType.ARRAY, TokenType.LIST, …}
- NO_
PAREN_ FUNCTIONS - NO_PAREN_FUNCTIONS: Functions that can be called without parentheses Maps TokenType to the function name for generation Python: NO_PAREN_FUNCTIONS = {TokenType.CURRENT_DATE: exp.CurrentDate, …}
- NO_
PAREN_ FUNCTION_ NAMES - NO_PAREN_FUNCTION_NAMES: String names that can be no-paren functions These are often tokenized as Var/Identifier instead of specific TokenTypes
- RESERVED_
TOKENS - RESERVED_TOKENS: Tokens that cannot be used as identifiers without quoting These are typically structural keywords that affect query parsing
- SIGNED_
TO_ UNSIGNED_ TYPE_ TOKEN - SIGNED_TO_UNSIGNED_TYPE_TOKEN: Maps signed types to unsigned types Python: SIGNED_TO_UNSIGNED_TYPE_TOKEN = {TokenType.BIGINT: TokenType.UBIGINT, …}
- STRUCT_
TYPE_ TOKENS - STRUCT_TYPE_TOKENS: Tokens that represent struct-like types Python: STRUCT_TYPE_TOKENS = {TokenType.FILE, TokenType.NESTED, TokenType.OBJECT, …}
- SUBQUERY_
PREDICATES - SUBQUERY_PREDICATES: Tokens that introduce subquery predicates Python: SUBQUERY_PREDICATES = {TokenType.ANY: exp.Any, …}
- TYPE_
TOKENS - TYPE_TOKENS: All tokens that represent data types Python: TYPE_TOKENS = {TokenType.BIT, TokenType.BOOLEAN, …}