Expand description
Shared SQL lexer for preprocess passes.
The lexer classifies a SQL string into non-overlapping segments, allowing
preprocess passes to scan only plain Text segments — string literals,
quoted identifiers, and comments are passed through opaquely and never
matched against patterns.
Supported SQL dialect features:
- Single-quoted strings (
'...') with''escape andE'...'with backslash escapes. - Double-quoted identifiers (
"..."). - Line comments (
-- ...to end-of-line). - Block comments (
/* ... */, nestable per PostgreSQL).
Enums§
- SqlSegment
- A classified segment of a SQL string.
Functions§
- find_
operator_ positions - Return the byte positions (relative to the start of
sql) of every occurrence ofopthat falls inside aTextsegment. - first_
sql_ word - Return the first SQL keyword/word in
sql, skipping leading whitespace, line comments, and block comments. ReturnsNoneif the input is empty or contains only whitespace/comments. - has_
brace_ outside_ literals - Return
trueif{appears inside anyTextsegment ofsql. - has_
operator_ outside_ literals - Return
trueifopappears verbatim inside anyTextsegment ofsql. The comparison is byte-exact (case-sensitive). Occurrences inside string literals, quoted identifiers, or comments are ignored. - keyword_
position_ outside_ literals - Return the byte position (relative to
sql) of the first case-insensitive occurrence of the keywordkwthat falls inside aTextsegment. ReturnsNoneif not found. - second_
sql_ word - Return the second SQL keyword/word in
sql, skipping leading whitespace, line comments, and block comments, then skipping the first word. ReturnsNoneif there is no second word. - segments
- Segment a SQL string into classified
SqlSegments.