# sqlglot-rust
A SQL parser, optimizer, and transpiler library written in Rust, inspired by Python's [sqlglot](https://github.com/tobymao/sqlglot).
## Features
- **Parse** SQL strings into a structured AST
- **Generate** SQL from AST nodes
- **Transpile** between 30 SQL dialects
- **LIMIT / TOP / FETCH FIRST** transpilation across dialects
- **Quoted identifier** preservation and cross-dialect conversion (`"id"` ↔ `` `id` `` ↔ `[id]`)
- **Optimize** SQL queries (constant folding, boolean simplification)
- **AST traversal** — walk, find, transform expressions
- CTEs, subqueries, set operations (UNION / INTERSECT / EXCEPT)
- Window functions with frames and filters
- CAST, TRY_CAST, EXTRACT, INTERVAL, EXISTS
- Full DDL support (CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE VIEW, etc.)
- Serde serialization for AST nodes
## Supported Dialects
All 30 dialects from Python's sqlglot are supported, with function name mapping,
data type mapping, and ILIKE rewriting between dialects.
### Official Dialects
| ANSI SQL | Base SQL standard |
| Athena | AWS Athena (Presto-based) |
| BigQuery | Google BigQuery |
| ClickHouse | ClickHouse |
| Databricks | Databricks (Spark-based) |
| DuckDB | DuckDB |
| Hive | Apache Hive |
| MySQL | MySQL |
| Oracle | Oracle Database |
| PostgreSQL | PostgreSQL |
| Presto | Presto |
| Redshift | Amazon Redshift (Postgres-based) |
| Snowflake | Snowflake |
| Spark | Apache Spark SQL |
| SQLite | SQLite |
| StarRocks | StarRocks (MySQL-compatible) |
| Trino | Trino (Presto successor) |
| T-SQL | Microsoft SQL Server |
### Community Dialects
| Doris | Apache Doris (MySQL-compatible) |
| Dremio | Dremio |
| Drill | Apache Drill |
| Druid | Apache Druid |
| Exasol | Exasol |
| Fabric | Microsoft Fabric (T-SQL variant) |
| Materialize | Materialize (Postgres-compatible) |
| PRQL | Pipelined Relational Query Language |
| RisingWave | RisingWave (Postgres-compatible) |
| SingleStore | SingleStore (MySQL-compatible) |
| Tableau | Tableau |
| Teradata | Teradata |
### Dialect Transform Rules
The transpiler applies dialect-specific rewrite rules when converting between dialects:
| Function name mapping | `NOW()` → `CURRENT_TIMESTAMP()`, `GETDATE()` |
| `SUBSTR` ↔ `SUBSTRING` | Postgres uses `SUBSTRING`, MySQL uses `SUBSTR` |
| `IFNULL` → `COALESCE` | MySQL `IFNULL` → ANSI `COALESCE` |
| `IFNULL` → `ISNULL` | MySQL `IFNULL` → T-SQL `ISNULL` |
| `NVL` → `COALESCE` | Oracle `NVL` → standard `COALESCE` |
| `LEN` ↔ `LENGTH` | T-SQL `LEN` ↔ standard `LENGTH` |
| `RAND` ↔ `RANDOM` | MySQL `RAND` ↔ Postgres `RANDOM` |
| `ILIKE` → `LOWER`/`LIKE` | Rewritten for dialects without native ILIKE |
| `LIMIT` ↔ `TOP` | `LIMIT 10` → `TOP 10` for T-SQL |
| `LIMIT` → `FETCH FIRST` | `LIMIT 10` → `FETCH FIRST 10 ROWS ONLY` (Oracle) |
| Quoted identifiers | `"id"` ↔ `` `id` `` ↔ `[id]` per dialect |
| Data type mapping | `TEXT` ↔ `STRING`, `INT` → `BIGINT` (BigQuery) |
| `BYTEA` ↔ `BLOB` | Postgres `BYTEA` ↔ MySQL `BLOB` |
## Quick Start
Add to your `Cargo.toml`:
```toml
[dependencies]
sqlglot-rust = "0.9.0"
```
### Parse and generate SQL
```rust
use sqlglot_rust::{parse, generate, Dialect};
fn main() {
// Parse a SQL query
let ast = parse("SELECT a, b FROM t WHERE a > 1", Dialect::Ansi).unwrap();
// Generate SQL (roundtrip)
let sql = generate(&ast, Dialect::Ansi);
assert_eq!(sql, "SELECT a, b FROM t WHERE a > 1");
}
```
### Transpile between dialects
```rust
use sqlglot_rust::{transpile, Dialect};
fn main() {
// Postgres → BigQuery: NOW() becomes CURRENT_TIMESTAMP()
let sql = transpile(
"SELECT NOW(), SUBSTRING(name, 1, 3) FROM users",
Dialect::Postgres,
Dialect::BigQuery,
).unwrap();
// → "SELECT CURRENT_TIMESTAMP(), SUBSTRING(name, 1, 3) FROM users"
// Oracle → T-SQL: NVL becomes ISNULL
let sql = transpile(
"SELECT NVL(a, b) FROM t",
Dialect::Oracle,
Dialect::Tsql,
).unwrap();
// → "SELECT ISNULL(a, b) FROM t"
}
```
### Parse a dialect name from a string
```rust
use sqlglot_rust::Dialect;
let d = Dialect::from_str("postgres").unwrap();
assert_eq!(d, Dialect::Postgres);
// Multiple aliases are supported
assert_eq!(Dialect::from_str("tsql"), Some(Dialect::Tsql));
assert_eq!(Dialect::from_str("mssql"), Some(Dialect::Tsql));
assert_eq!(Dialect::from_str("sqlserver"), Some(Dialect::Tsql));
```
### Supported Statements
- `SELECT` (with JOINs, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT, OFFSET, TOP, FETCH FIRST, QUALIFY)
- `INSERT INTO ... VALUES` / `INSERT INTO ... SELECT`
- `UPDATE ... SET ... WHERE` (with RETURNING)
- `DELETE FROM ... WHERE` (with USING)
- `CREATE TABLE` (with constraints, IF NOT EXISTS, TEMPORARY, AS SELECT)
- `DROP TABLE` (with IF EXISTS)
- `ALTER TABLE` (ADD COLUMN, DROP COLUMN, RENAME COLUMN, RENAME TO, ALTER TYPE)
- `CREATE VIEW` / `DROP VIEW`
- `TRUNCATE TABLE`
- `BEGIN` / `COMMIT` / `ROLLBACK`
- `EXPLAIN`
- `USE`
### Expression Support
- Column references (qualified and unqualified) with quote-style metadata
- Numeric, string, boolean, and NULL literals
- Binary operators (`+`, `-`, `*`, `/`, `%`, `=`, `<>`, `<`, `>`, `<=`, `>=`, `AND`, `OR`, `||`)
- Unary operators (`NOT`, `-`, `+`)
- Bitwise operators (`&`, `|`, `^`, `<<`, `>>`)
- Function calls (with DISTINCT, FILTER, OVER support)
- `BETWEEN`, `IN`, `IS NULL`, `LIKE`, `ILIKE`
- `CASE ... WHEN ... THEN ... ELSE ... END`
- `CAST`, `TRY_CAST`, PostgreSQL `::` cast
- `EXTRACT(field FROM expr)` for all date/time fields
- `INTERVAL` expressions
- `EXISTS`, `IN (subquery)`
- Array literals (`ARRAY[1, 2, 3]`)
- Window functions with frames (`ROWS`, `RANGE`, `GROUPS`)
- Common Table Expressions (WITH / WITH RECURSIVE)
- Set operations (UNION, INTERSECT, EXCEPT with ALL)
- Parenthesized sub-expressions and subqueries
## Documentation
- **[Installation](docs/installation.md)** — Dependency setup and verification
- **[Developer Guide](docs/developer-guide.md)** — Parsing, generating, transpiling, AST traversal, optimization, and serialization with full code examples
- **[API Reference](docs/reference.md)** — Complete type catalog, function signatures, dialect tables, and error variants
## Architecture
```text
src/
├── ast/ # AST node definitions (~40 expression types, 15 statement types)
├── tokens/ # Token types (~200+ variants) and tokenizer
├── parser/ # Recursive-descent SQL parser
├── generator/ # SQL code generator
├── dialects/ # 30 dialect definitions with transform rules
├── optimizer/ # Query optimization (constant folding, boolean simplification)
├── errors/ # Error types
└── lib.rs # Public API (parse, generate, transpile)
```
## Development
```bash
# Build
cargo build
# Run tests (244+ tests)
cargo test
# Run benchmarks
cargo bench
# Lint
cargo clippy
# Format
cargo fmt
```
A `Makefile` is provided for convenience:
```bash
make build # cargo build
make test # cargo test
make lint # cargo clippy
make fmt # cargo fmt
make sbom # Generate SPDX SBOM (see below)
make bump-version # Update version across all files (see below)
make all # build + sbom
```
## SBOM (Software Bill of Materials)
The project supports generating an SBOM in [SPDX](https://spdx.dev/) 2.3 JSON format using [`cargo-sbom`](https://crates.io/crates/cargo-sbom).
### Prerequisites
Install `cargo-sbom`:
```bash
cargo install cargo-sbom
```
### Generate the SBOM
```bash
make sbom
```
This writes the SBOM to `target/sbom/sqlglot-rust.spdx.json`. You can also run the command directly:
```bash
cargo sbom --output-format spdx_json_2_3 > target/sbom/sqlglot-rust.spdx.json
```
The generated SBOM includes all dependency packages with license information, download locations, and [Package URLs (PURLs)](https://github.com/package-url/purl-spec).
## Updating the Version
Use the `bump-version` Makefile target to update the version consistently across
all configuration and documentation files:
```bash
make bump-version VERSION=1.0.0
```
This updates:
- `Cargo.toml` — package version
- `README.md` — dependency snippet
- `docs/installation.md` — dependency snippet
- `Cargo.lock` — regenerated automatically
The `VERSION` parameter is required and must be a full semantic version (e.g. `1.0.0`, `0.10.1`).
## Acknowledgements
Inspired by [sqlglot](https://github.com/tobymao/sqlglot) by Toby Mao.
## License
MIT