polyglot-sql
Core SQL parsing and dialect translation library for Rust. Parses, generates, transpiles, and formats SQL across 32 database dialects.
Part of the Polyglot project.
Features
- Parse SQL into a fully-typed AST with 200+ expression types
- Generate SQL from AST nodes for any target dialect
- Transpile between any pair of 32 dialects in one call
- Format / pretty-print SQL
- Fluent builder API for constructing queries programmatically
- AST traversal utilities (DFS/BFS iterators, transform, walk)
- Validation with syntax checking and error location reporting
- Schema module for column resolution and type annotation
Usage
Cargo Features
By default, polyglot-sql enables the full public API. Parser-only consumers can
disable default features and opt into only the dialect parsers they need:
= { = "0.4", = false }
= {
version = "0.4",
= false,
= ["dialect-clickhouse"],
}
Optional capability features include generate, transpile, builder,
ast-tools, semantic, openlineage, diff, planner, and time.
Examples:
# Parse and generate SQL for one dialect.
= {
version = "0.4",
= false,
= ["generate", "dialect-clickhouse"],
}
# Cross-dialect transpilation.
= {
version = "0.4",
= false,
= ["transpile", "dialect-clickhouse", "dialect-postgresql"],
}
Transpile
use ;
let result = transpile.unwrap;
assert_eq!;
You can also transpile through a Dialect handle directly — useful when you
already hold one (e.g., for custom dialects) or need pretty-printed output:
use ;
let mysql = get;
// Built-in target via DialectType
let plain = mysql.transpile.unwrap;
// Pretty-printed output via TranspileOptions
let pretty = mysql
.transpile_with
.unwrap;
// Target a custom (or built-in) Dialect handle directly
let pg = get;
let via_handle = mysql.transpile.unwrap;
Parse + Generate
use ;
let ast = parse.unwrap;
let sql = generate.unwrap;
assert_eq!;
Format With Guard Options
Formatting is protected by guard limits by default:
max_input_bytes:16 * 1024 * 1024max_tokens:1_000_000max_ast_nodes:1_000_000max_set_op_chain:256
You can override these limits per call:
use ;
let options = FormatGuardOptions ;
let formatted = format_with_options.unwrap;
assert!;
Guard failures include stable codes in the error message:
E_GUARD_INPUT_TOO_LARGEE_GUARD_TOKEN_BUDGET_EXCEEDEDE_GUARD_AST_BUDGET_EXCEEDEDE_GUARD_SET_OP_CHAIN_EXCEEDED
Fluent Builder
use *;
// SELECT id, name FROM users WHERE age > 18 ORDER BY name LIMIT 10
let expr = select
.from
.where_
.order_by
.limit
.build;
Expression Helpers
use *;
// Column references (supports dotted names)
let c = col;
// Literals
let s = lit; // 'hello'
let n = lit; // 42
let f = lit; // 3.14
let b = lit; // TRUE
// Operators
let cond = col.gte.and;
// Functions
let f = func;
CASE Expressions
use *;
let expr = case
.when
.when
.else_
.build;
Set Operations
use *;
let expr = union_all
.order_by
.limit
.build;
INSERT, UPDATE, DELETE
use *;
// INSERT INTO users (id, name) VALUES (1, 'Alice')
let ins = insert_into
.columns
.values
.build;
// UPDATE users SET name = 'Bob' WHERE id = 1
let upd = update
.set
.where_
.build;
// DELETE FROM users WHERE id = 1
let del = delete
.where_
.build;
AST Traversal
use ;
let ast = parse.unwrap;
let columns = get_columns;
let tables = get_tables;
Validation
use ;
let result = validate;
// result contains error with line/column location
use ;
let schema = ValidationSchema ;
let opts = SchemaValidationOptions ;
let result = validate_with_schema;
assert!;
Schema-aware validation emits stable codes such as:
E200/E201for unknown tables/columnsE210-E217andW210-W216for type checksE220,E221,W220,W221,W222for reference/FK checks
Tokenize
Access the raw token stream with full source position spans. Each token carries a Span with byte offsets and line/column numbers.
use ;
let dialect = new;
let tokens = dialect.tokenize.unwrap;
for token in &tokens
The Span struct provides:
| Field | Type | Description |
|---|---|---|
start |
usize |
Start byte offset (0-based) |
end |
usize |
End byte offset (exclusive) |
line |
usize |
Line number (1-based) |
column |
usize |
Column number (1-based) |
Error Reporting
Parse and tokenize errors include source position information with line/column numbers and byte offset ranges, making it straightforward to provide precise error feedback.
use ;
let result = parse;
if let Err = result
The Error enum provides line(), column(), start(), and end() accessors that return Option<usize> for Parse, Tokenize, and Syntax error variants:
use Error;
let err = parse;
assert_eq!;
assert_eq!;
// Generation errors don't carry position info
let err = generate;
assert_eq!;
Supported Dialects
Athena, BigQuery, ClickHouse, CockroachDB, Databricks, Doris, Dremio, Drill, Druid, DuckDB, Dune, Exasol, Fabric, Hive, Materialize, MySQL, Oracle, PostgreSQL, Presto, Redshift, RisingWave, SingleStore, Snowflake, Solr, Spark, SQLite, StarRocks, Tableau, Teradata, TiDB, Trino, TSQL
Feature Flags
| Flag | Description |
|---|---|
generate |
Enable SQL generation and formatting from AST nodes |
transpile |
Enable cross-dialect transpilation; implies generate |
builder |
Enable the fluent query builder API; implies generate |
ast-tools |
Enable AST inspection and transform helper APIs |
semantic |
Enable schema, resolver, lineage, optimizer, and validation APIs |
openlineage |
Enable OpenLineage payload generation; implies semantic |
diff |
Enable AST diff support; implies generate |
planner |
Enable logical planning helpers |
time |
Enable time-format conversion helpers |
bindings |
Enable ts-rs TypeScript type generation |