orql
A toy SQL parser for a subset of the Oracle dialect.
motivation
The motivation for parser is to parse Oracle SQL statements into as an Abstract Syntax Tree while allowing the complete reconstruction of the orignally parsed source code. Therefore, the parser tries to preserve and expose token location as well as comments.
At the same time, the challenge is to not pay additional costs when not being interested in token location information and/or comments.
example
let sql = "select * from dual";
let stmts = parse.expect;
for stmt in stmts
Advanced examples demo'ing access to locations and comments can be found in the rustdoc of the individual, public modules.
status
- Literals
- Text literals
- Numbers
- Datetimes
- Intervals
- DbLink identifiers
- Data Types
- Operators
- Expressions
-
simple_expression -
compound_expression-
PRIOR ... -
COLLATE ...
-
-
av_meas_expression -
case_expression -
cursor_expression -
datetime_expression -
function_expression -
interval_expression -
JSON_object_access_expression -
model_expression -
object_access_expresion -
scalar_subquery_expression -
type_constructor_expression -
placeholder_expression-
INDICATOR ...
-
-
- Conditions
- Functions
- Named argument syntax, e.g.
f(arg0 => 1, arg1 => 2)- only for generic, user-defined functions
- Wildcard arguments
- only in function calls with explicit support for it, e.g.
count
- only in function calls with explicit support for it, e.g.
- Analytic and Aggregate functions
- "duplicate treatmeant", e.g.
DISTINCT,UNIQUE,ALLkeywords before the first argument -
WITHIN GROUP -
OVERclause -
KEEP FIRST | LAST
- "duplicate treatmeant", e.g.
- Built-in functions (
*marks functions with dedicated / extended syntax clauses)-
ABS -
ACOS -
ADD_MONTHS -
ANY_VALUE -
APPROX_COUNT* -
APPROX_COUNT_DISTINCT -
APPROX_COUNT_DISTINCT_AGG -
APPROX_COUNT_DISTINCT_DETAIL -
APPROX_MEDIAN* -
APPROX_PERCENTILE* -
APPROX_PERCENTILE_AGG -
APPROX_PERCENTILE_DETAIL* -
APPROX_RANK* -
APPROX_SUM* -
ASCII -
ASCIISTR -
ASIN -
ATAN -
ATAN2 -
AVG* -
BFILENAME -
BIN_TO_NUM -
BITAND -
BIT_AND_AGG -
BITMAP_BIT_POSITION -
BITMAP_BUCKET_NUMBER -
BITMAP_CONSTRUCT_AGG -
BITMAP_COUNT -
BITMAP_OR_AGG -
BIT_OR_AGG -
BIT_XOR_AGG -
CARDINALITY -
CAST* -
CEIL -
CHARTOROWID -
checksum* -
CHR* -
CLUSTER_DETAILS* -
CLUSTER_DISTANCE* -
CLUSTER_ID* -
CLUSTER_PROBABILITY* -
CLUSTER_SET* -
COALESCE -
COLLATION -
COLLECT* -
COMPOSE -
CON_DBID_TO_ID -
CON_GUID_TO_ID -
con_id_to_con_name -
con_id_to_dbid -
con_id_to_guid -
con_id_to_uid -
CON_NAME_TO_ID -
CON_UID_TO_ID -
CONCAT -
CONVERT -
CORR* -
CORR_A -
COS -
COSH -
COUNT* -
COVAR_POP* -
COVAR_SAMP* -
CUBE_TABLE -
CUME_DIST* -
CURRENT_DATE -
CURRENT_TIMESTAMP -
CV -
DATAOBJ_TO_MAT_PARTITION -
DATAOBJ_TO_PARTITION -
DBTIMEZONE -
DECODE -
DECOMPOSE -
DENSE_RANK* -
DEPTH -
DEREF -
DUMP -
EMPTY_BLOB-EMPTY_CLOB -
EXISTSNODE -
EXP -
EXTRACT-datetime* -
EXTRACT-XML -
EXTRACTVALUE -
FEATURE_COMPARE* -
FEATURE_DETAILS* -
FEATURE_ID* -
FEATURE_SET -
FEATURE_VALUE* -
FIRST* -
FIRST_VALUE* -
FLOOR -
FROM_TZ -
GREATEST -
GROUP_ID -
GROUPING -
GROUPING_ID -
HEXTORAW -
INITCAP -
INSTR -
ITERATION_NUMBER -
JSON_ARRAY* -
JSON_ARRAYAGG* -
JSON_DATAGUIDE -
JSON_MERGEPATCH* -
JSON_OBJECT* -
JSON_OBJECTAGG* -
JSON_QUERY* -
json_scalar* -
JSON_SERIALIZE* -
JSON_TABLE* -
JSON_TRANSFORM -
JSON_VALUE* -
json-type-constructor -
KURTOSIS_POP -
KURTOSIS_SAMP -
LAG* -
LAST* -
LAST_DAY -
LAST_VALUE* -
LEAD* -
LEAST -
LENGTH -
LISTAGG* -
LN -
LNNVL* -
LOCALTIMESTAMP -
LOG -
LOWER -
LPAD -
LTRIM -
MAKE_REF -
MAX* -
MEDIAN* -
MIN* -
MOD -
MONTHS_BETWEEN -
NANVL -
NCHR -
NEW_TIME -
NEXT_DAY -
NLS_CHARSET_DECL_LEN -
NLS_CHARSET_ID -
NLS_CHARSET_NAME -
NLS_COLLATION_ID -
NLS_COLLATION_NAME -
NLS_INITCAP -
NLS_LOWER -
NLS_UPPER -
NLSSORT -
NTH_VALUE* -
NTILE* -
NULLIF -
NUMTODSINTERVAL -
NUMTOYMINTERVAL -
NVL -
NVL2 -
ORA_DM_PARTITION_NAME* -
ORA_DST_AFFECTED -
ORA_DST_CONVERT -
ORA_DST_ERROR -
ORA_HASH -
ORA_INVOKING_USER -
ORA_INVOKING_USERID -
PATH -
PERCENT_RANK* -
PERCENTILE_CONT* -
PERCENTILE_DISC* -
POWER -
POWERMULTISET -
POWERMULTISET_BY_CARDINALITY -
PREDICTION* -
PREDICTION_BOUNDS* -
PREDICTION_COST* -
PREDICTION_DETAILS* -
PREDICTION_PROBABILITY* -
PREDICTION_SET* -
PRESENTNNV -
PRESENTV -
PREVIOUS -
RANK* -
RATIO_TO_REPORT* -
RAWTOHEX -
RAWTONHEX -
REF -
REFTOHEX -
REGEXP_COUNT -
REGEXP_INSTR -
REGEXP_REPLACE -
REGEXP_SUBSTR -
REGR_-Linear-Regression-Functions* -
REMAINDER -
REPLACE -
ROUND-date -
ROUND-number -
ROUND_TIES_TO_EVEN-number -
ROW_NUMBER* -
ROWIDTOCHAR -
ROWIDTONCHAR -
RPAD -
RTRIM -
SCN_TO_TIMESTAMP -
SESSIONTIMEZONE -
SET -
SIGN -
SIN -
SINH -
SKEWNESS_POP* -
SKEWNESS_SAMP* -
SOUNDEX -
SQRT -
STANDARD_HASH -
STATS_BINOMIAL_TEST -
STATS_CROSSTAB -
STATS_F_TEST -
STATS_KS_TEST -
STATS_MODE -
STATS_MW_TEST -
STATS_ONE_WAY_ANOVA -
STATS_T_TEST_ -
STATS_WSR_TEST -
STDDEV -
STDDEV_POP -
STDDEV_SAMP -
SUBSTR -
SUM -
SYS_CONNECT_BY_PATH -
SYS_CONTEXT -
SYS_DBURIGEN -
SYS_EXTRACT_UTC -
SYS_GUID -
SYS_OP_ZONE_ID -
SYS_TYPEID -
SYS_XMLAGG -
SYS_XMLGEN -
SYSDATE -
SYSTIMESTAMP -
TAN -
TANH -
TIMESTAMP_TO_SCN -
TO_APPROX_COUNT_DISTINCT -
TO_APPROX_PERCENTILE -
TO_BINARY_DOUBLE* -
TO_BINARY_FLOAT* -
TO_BLOB-bfile -
TO_BLOB-raw -
TO_CHAR-bfile-blob -
TO_CHAR-character -
TO_CHAR-datetime -
TO_CHAR-number -
TO_CLOB-bfile-blob -
TO_CLOB-character -
TO_DATE* -
TO_DSINTERVAL* -
TO_LOB -
TO_MULTI_BYTE -
TO_NCHAR-character -
TO_NCHAR-datetime -
TO_NCHAR-number -
TO_NCLOB -
TO_NUMBER* -
TO_SINGLE_BYTE -
TO_TIMESTAMP* -
TO_TIMESTAMP_TZ* -
TO_UTC_TIMESTAMP_TZ -
TO_YMINTERVAL* -
TRANSLATE -
TRANSLATE-USING -
TREAT* -
TRIM* -
TRUNC-date -
TRUNC-number -
TZ_OFFSET* -
UID -
UNISTR -
UPPER -
USER -
USERENV -
VALIDATE_CONVERSION* -
VALUE -
VAR_POP -
VAR_SAMP -
VARIANCE -
VSIZE -
WIDTH_BUCKET -
XMLAGG* -
XMLCAST* -
XMLCDATA -
XMLCOLATTVAL* -
XMLCOMMENT -
XMLCONCAT -
XMLDIFF -
XMLELEMENT* -
XMLEXISTS* -
XMLFOREST* -
XMLISVALID -
XMLPARSE* -
XMLPATCH -
XMLPI* -
XMLQUERY* -
XMLSEQUENCE -
XMLSERIALIZE* -
XMLTABLE* -
XMLTRANSFORM
-
- Named argument syntax, e.g.
- Statements
-
SELECT— mostly covered- Optimizer hints
- merely the raw comment
- unconditionally, even when parsing without comments
-
CONNECT_BY -
model_clause -
window_clause
- Optimizer hints
-
MERGE -
UPDATE -
INSERT -
DELETE
-
not supported syntax
SQL syntax noted next is not supported by design:
multiple ON clauses
Example:
select *
from (select 1 as alpha, 'alpha' as name from dual) a
left join (select 1 as beta, 'beta' as name from dual) b
join (select 1 as gamma, 'gamma' as name from dual) c
on 1=1 -- causes `b` and `c` to be joined first ...
on 1=1 -- ... on then this gets evaluated over of the previous inner join with `a`
;
Explanation: https://www.sqlservercentral.com/forums/topic/multiple-on-clauses-in-one-join
references / related projects
-
datafusion-sqlparser-rs — A mature and complete SQL parser covering the full ANSI/ISO SQL Standard with support for database product specific extensions. Its exposed AST is general purpose — a union of all supported dialect specific features. Since
sqlparser's AST doesn't contain nodes for every token in the parsed SQL, it's difficult to reconstruct the originally parsed SQL with regards to possible comments and indentation.