Skip to main content

Module lex

Module lex 

Source
Expand description

Shared SQL lexer for preprocess passes.

The lexer classifies a SQL string into non-overlapping segments, allowing preprocess passes to scan only plain Text segments — string literals, quoted identifiers, and comments are passed through opaquely and never matched against patterns.

Supported SQL dialect features:

  • Single-quoted strings ('...') with '' escape and E'...' with backslash escapes.
  • Double-quoted identifiers ("...").
  • Line comments (-- ... to end-of-line).
  • Block comments (/* ... */, nestable per PostgreSQL).

Enums§

SqlSegment
A classified segment of a SQL string.

Functions§

find_operator_positions
Return the byte positions (relative to the start of sql) of every occurrence of op that falls inside a Text segment.
first_sql_word
Return the first SQL keyword/word in sql, skipping leading whitespace, line comments, and block comments. Returns None if the input is empty or contains only whitespace/comments.
has_brace_outside_literals
Return true if { appears inside any Text segment of sql.
has_operator_outside_literals
Return true if op appears verbatim inside any Text segment of sql. The comparison is byte-exact (case-sensitive). Occurrences inside string literals, quoted identifiers, or comments are ignored.
keyword_position_outside_literals
Return the byte position (relative to sql) of the first case-insensitive occurrence of the keyword kw that falls inside a Text segment. Returns None if not found.
second_sql_word
Return the second SQL keyword/word in sql, skipping leading whitespace, line comments, and block comments, then skipping the first word. Returns None if there is no second word.
segments
Segment a SQL string into classified SqlSegments.