# Papera
A SQL compatibility layer that transpiles Trino, Redshift, and Hive SQL to DuckDB SQL.
papera parses source SQL using [sqlparser-rs](https://github.com/sqlparser-rs/sqlparser-rs), applies dialect-specific AST transformations, and emits DuckDB-compatible SQL. It handles function name differences, type mappings, DDL syntax variations, and more.
## Installation
### As a Library
Add to your `Cargo.toml`:
```toml
[dependencies]
papera = "0.1"
```
### CLI
The CLI binary is feature-gated and not built by default:
```sh
cargo build --features cli
# or install globally
cargo install --path . --features cli
```
## Usage
### CLI
```sh
# Pipe SQL through papera
echo "SELECT approx_distinct(col) FROM t" | papera trino
# Output: SELECT approx_count_distinct(col) FROM t
```
```
```
### Library
```rust
use papera::{
transpile, transpile_with_options, SourceDialect, TranspileOptions,
ExternalTableBehavior, IcebergTableBehavior, CopyBehavior,
};
// Simple usage
let sql = "SELECT NVL(a, b) FROM t";
let result = transpile(sql, SourceDialect::Redshift).unwrap();
assert_eq!(result, "SELECT coalesce(a, b) FROM t");
// With options (e.g., convert external tables to views)
let sql = "CREATE EXTERNAL TABLE t (a INT) STORED AS PARQUET LOCATION 's3://bucket/path'";
let opts = TranspileOptions {
external_table: ExternalTableBehavior::MapToView,
..Default::default()
};
let result = transpile_with_options(sql, SourceDialect::Redshift, &opts).unwrap();
// Output: CREATE VIEW t (a) AS SELECT * FROM read_parquet('s3://bucket/path')
// Migration mode: opt into all conversions
let opts = TranspileOptions {
external_table: ExternalTableBehavior::MapToView,
iceberg_table: IcebergTableBehavior::MapToView,
copy: CopyBehavior::MapToInsert,
..Default::default()
};
```
Custom SerDe class mappings for classes not covered by the built-in resolver:
```rust
use papera::{SerdeClassResolver, TranspileOptions, ExternalTableBehavior};
let opts = TranspileOptions {
external_table: ExternalTableBehavior::MapToView,
serde_class_resolver: Some(SerdeClassResolver::new(|class| {
match class {
c if c.eq_ignore_ascii_case("com.example.MyParquetSerde") => Some("read_parquet".to_string()),
c if c.eq_ignore_ascii_case("com.example.MyJsonSerde") => Some("read_json".to_string()),
_ => None, // fall through to built-in logic
}
})),
..Default::default()
};
```
### Multi-Statement SQL
Both `transpile` and `transpile_with_options` accept multi-statement SQL. Statements are parsed together and emitted joined with `;\n`:
```rust
let script = "SELECT NVL(a, b) FROM t; SELECT GETDATE()";
let result = transpile(script, SourceDialect::Redshift).unwrap();
// result: "SELECT coalesce(a, b) FROM t;\nSELECT current_timestamp()"
```
See `cargo run --example multi_statement` for a full ETL script example.
### Error Handling
`transpile` and `transpile_with_options` return `papera::Result<String>`, where `papera::Error` has two variants:
```rust
pub enum Error {
/// The source SQL could not be parsed by sqlparser-rs.
Parse(sqlparser::parser::ParserError),
/// The SQL uses a feature that cannot be transpiled to DuckDB
/// (e.g., an unsupported type or a conversion that was not opted into).
Unsupported(String),
}
```
`Error::Unsupported` is returned for:
- `CREATE EXTERNAL TABLE` when `external_table` is `Error` (the default)
- Iceberg tables when `iceberg_table` is `Error` (the default)
- `COPY FROM` when `copy` is `Error` (the default)
- Types with no DuckDB equivalent ( `HLLSKETCH`, `GEOMETRY` )
See `examples/` for more complete usage patterns (`cargo run --example basic`, `cargo run --example migration`, `cargo run --example serde_resolver`).
## Feature Coverage
### Supported Dialects
| Trino | `GenericDialect` | Also handles Hive-style DDL (STORED AS, TBLPROPERTIES, etc.) |
| Redshift | `RedshiftSqlDialect` | Includes Redshift Spectrum external tables |
| Hive | `HiveDialect` | ROW FORMAT, SERDE, PARTITIONED BY support |
### Function Mappings: Trino to DuckDB
#### Aggregate
| `approx_distinct(x)` | `approx_count_distinct(x)` | |
| `arbitrary(x)` | `any_value(x)` | |
| `approx_percentile(x, p)` | `approx_quantile(x, p)` | |
| `map_agg(k, v)` | `map(list(k), list(v))` | |
#### Date / Time
| `date_parse(s, fmt)` | `strptime(s, fmt)` | Java format strings converted |
| `format_datetime(ts, fmt)` | `strftime(ts, fmt)` | Java format strings converted |
| `date_format(ts, fmt)` | `strftime(ts, fmt)` | Java format strings converted |
| `at_timezone(ts, tz)` | `ts AT TIME ZONE tz` | |
| `with_timezone(ts, tz)` | `ts AT TIME ZONE tz` | |
| `parse_datetime(s, fmt)` | `strptime(s, fmt)` | Java format strings converted |
| `to_unixtime(ts)` | `epoch(ts)` | |
| `current_timezone()` | `current_setting('TimeZone')` | |
| `from_unixtime(t)` | `to_timestamp(t)` | |
| `date_diff(unit, t1, t2)` | `date_diff(unit, t1, t2)` | |
| `date_add(unit, n, ts)` | `date_add(unit, n, ts)` | |
| `day_of_week(d)` | `dayofweek(d)` | |
| `day_of_year(d)` | `dayofyear(d)` | |
| `week_of_year(d)` | `weekofyear(d)` | |
| `year_of_week(d)` | `yearofweek(d)` | |
#### String
| `split(s, del)` | `str_split(s, del)` | |
| `levenshtein_distance(a, b)` | `levenshtein(a, b)` | |
| `regexp_like(s, p)` | `regexp_matches(s, p)` | |
| `regexp_extract(s, p[, g])` | `regexp_extract(s, p[, g])` | |
| `regexp_replace(s, p, r)` | `regexp_replace(s, p, r)` | |
| `strpos(s, sub)` | `strpos(s, sub)` | |
| `length(s)` | `length(s)` | |
| `reverse(s)` | `reverse(s)` | |
| `lpad(s, n, c)` | `lpad(s, n, c)` | |
| `rpad(s, n, c)` | `rpad(s, n, c)` | |
| `chr(n)` | `chr(n)` | |
| `codepoint(c)` | `unicode(c)` | |
| `from_hex(s)` | `unhex(s)` | |
| `to_utf8(s)` | `encode(s)` | Returns BLOB |
| `from_utf8(b)` | `decode(b)` | |
| `url_extract_host(url)` | `regexp_extract(url, ...)` | Approximation via regex |
| `url_extract_path(url)` | `regexp_extract(url, ...)` | Approximation via regex |
| `url_extract_protocol(url)` | `regexp_extract(url, ...)` | Approximation via regex |
| `url_extract_query(url)` | `regexp_extract(url, ...)` | Approximation via regex |
| `url_extract_fragment(url)` | `regexp_extract(url, ...)` | Approximation via regex |
| `url_extract_port(url)` | `regexp_extract(url, ...)` | Approximation via regex |
#### Array / Map
| `transform(arr, fn)` | `list_transform(arr, fn)` | |
| `sequence(start, stop)` | `generate_series(start, stop)` | |
| `element_at(arr, i)` | `list_extract(arr, i)` | |
| `cardinality(x)` | `len(x)` | |
| `array_join(arr, sep)` | `array_to_string(arr, sep)` | |
| `reduce(arr, ...)` | `list_reduce(arr, ...)` | |
| `filter(arr, fn)` | `list_filter(arr, fn)` | |
| `contains(arr, x)` | `list_contains(arr, x)` | |
| `zip(a, b)` | `list_zip(a, b)` | |
| `flatten(arr)` | `flatten(arr)` | |
| `slice(arr, ...)` | `list_slice(arr, ...)` | |
| `array_distinct(arr)` | `list_distinct(arr)` | |
| `array_sort(arr)` | `list_sort(arr)` | |
| `array_max(arr)` | `list_max(arr)` | |
| `array_min(arr)` | `list_min(arr)` | |
| `array_position(arr, x)` | `list_position(arr, x)` | |
| `array_remove(arr, x)` | `list_filter(arr, x)` | Approximate |
| `array_intersect(a, b)` | `list_intersect(a, b)` | |
| `array_concat(a, b)` | `list_concat(a, b)` | |
| `array_except(a, b)` | `list_except(a, b)` | |
| `array_union(a, b)` | `list_distinct(list_concat(a, b))` | |
| `arrays_overlap(a, b)` | `len(list_intersect(a, b)) > 0` | |
| `array_has(arr, x)` | `list_contains(arr, x)` | |
| `array_has_all(arr, req)` | `len(list_intersect(arr, req)) = len(req)` | |
| `array_has_any(arr, cands)` | `len(list_intersect(arr, cands)) > 0` | |
| `array_sum(arr)` | `list_sum(arr)` | |
| `array_average(arr)` | `list_avg(arr)` | |
| `map_keys(m)` | `map_keys(m)` | |
| `map_values(m)` | `map_values(m)` | |
| `map_concat(m1, m2)` | `map_concat(m1, m2)` | |
#### JSON
| `json_extract_scalar(j, p)` | `json_extract_string(j, p)` | |
| `json_extract(j, p)` | `json_extract(j, p)` | |
| `json_parse(s)` | `CAST(s AS JSON)` | |
| `json_format(j)` | `CAST(j AS VARCHAR)` | |
| `json_array_get(j, idx)` | `json_extract_string(j, '$[idx]')` | Literal index only |
| `json_array_length(j)` | `json_array_length(j)` | |
| `json_object_keys(j)` | `json_keys(j)` | |
#### Math / Numeric
| `is_nan(x)` | `isnan(x)` | |
| `is_finite(x)` | `isfinite(x)` | |
| `is_infinite(x)` | `isinf(x)` | |
| `nan()` | `CAST('NaN' AS DOUBLE)` | |
| `infinity()` | `CAST('Infinity' AS DOUBLE)` | |
| `rand()` | `random()` | |
| `typeof(x)` | `typeof(x)` | |
#### Bitwise
| `bitwise_and(a, b)` | `a & b` | |
| `bitwise_or(a, b)` | `a \| b` | |
| `bitwise_xor(a, b)` | `a ^ b` | |
| `bitwise_not(a)` | `~a` | |
| `bitwise_left_shift(a, b)` | `a << b` | |
| `bitwise_right_shift(a, b)` | `a >> b` | |
### Function Mappings: Redshift to DuckDB
#### Date / Time
| `GETDATE()` | `current_timestamp()` | |
| `SYSDATE` | `current_timestamp` | |
| `DATEADD(part, n, d)` | `d + INTERVAL 'n' part` | Interval arithmetic |
| `DATEDIFF(part, d1, d2)` | `date_diff('part', d1, d2)` | Datepart quoted |
| `DATE_TRUNC(part, d)` | `date_trunc(part, d)` | |
| `CONVERT_TIMEZONE(tz, ts)` | `ts AT TIME ZONE 'tz'` | 2-arg form |
| `CONVERT_TIMEZONE(src, dst, ts)` | `ts AT TIME ZONE 'src' AT TIME ZONE 'dst'` | 3-arg form |
| `TO_DATE(s, fmt)` | `CAST(strptime(s, fmt) AS DATE)` | PG format strings converted |
| `TO_TIMESTAMP(s, fmt)` | `strptime(s, fmt)` | PG format strings converted |
| `TO_CHAR(ts, fmt)` | `strftime(ts, fmt)` | PG format strings converted |
| `MONTHS_BETWEEN(d1, d2)` | `datediff('month', d2, d1)` | |
| `ADD_MONTHS(d, n)` | `d + INTERVAL 'n' MONTH` | |
#### String
| `NVL(a, b)` | `coalesce(a, b)` | |
| `NVL2(e, a, b)` | `CASE WHEN e IS NOT NULL THEN a ELSE b END` | |
| `ISNULL(v, r)` | `coalesce(v, r)` | 2-arg form only |
| `LEN(s)` | `length(s)` | |
| `LCASE(s)` | `lower(s)` | |
| `UCASE(s)` | `upper(s)` | |
| `UPPER(s)` | `upper(s)` | |
| `LOWER(s)` | `lower(s)` | |
| `LEFT(s, n)` | `left(s, n)` | |
| `RIGHT(s, n)` | `right(s, n)` | |
| `SUBSTRING(s, ...)` | `substring(s, ...)` | |
| `REPLACE(s, from, to)` | `replace(s, from, to)` | |
| `BTRIM(s)` | `trim(s)` | |
| `TRIM(s)` | `trim(s)` | |
| `CHARINDEX(sub, str)` | `strpos(str, sub)` | Args swapped |
| `SPACE(n)` | `repeat(' ', n)` | |
| `REGEXP_SUBSTR(s, p)` | `regexp_extract(s, p)` | |
| `REGEXP_COUNT(s, p)` | `len(regexp_extract_all(s, p))` | |
#### Aggregate
| `DECODE(e, s1, r1, ..., def)` | `CASE e WHEN s1 THEN r1 ... ELSE def END` | |
| `LISTAGG(col, sep)` | `string_agg(col, sep)` | |
#### JSON
| `JSON_EXTRACT_PATH_TEXT(j, k1, k2, ...)` | `json_extract_string(j, '$.k1.k2...')` | Literal keys only |
| `JSON_EXTRACT_ARRAY_ELEMENT_TEXT(j, i)` | `json_extract_string(j, '$[i]')` | Literal index only |
| `JSON_ARRAY_LENGTH(j)` | `json_array_length(j)` | |
| `JSON_TYPEOF(j)` | `json_type(j)` | |
| `JSON_SERIALIZE(j)` | `CAST(j AS VARCHAR)` | |
| `JSON_DESERIALIZE(s)` | `CAST(s AS JSON)` | |
| `IS_VALID_JSON(s)` | `json_valid(s)` | |
#### Array
| `ARRAY_CONCAT(a, b)` | `list_concat(a, b)` | |
#### Crypto / Encoding
| `MD5(s)` | `md5(s)` | |
| `SHA1(s)` | `sha1(s)` | |
| `SHA2(s, 256)` | `sha256(s)` | 256-bit only |
#### Emulated
| `STRTOL(s, base)` | `CASE base WHEN 16 THEN CAST(('0x' \|\| s) AS BIGINT) WHEN 10 THEN CAST(s AS BIGINT) END` | Base 10 and 16 only |
| `RATIO_TO_REPORT(col) OVER (...)` | `col / SUM(col) OVER (...)` | Window clause preserved |
#### Unsupported
| `BPCHARCMP(a, b)` | No DuckDB equivalent |
### Type Mappings: Trino to DuckDB
| `ROW(a INT, b VARCHAR)` | `STRUCT(a INT, b VARCHAR)` | CAST, DDL |
| `ARRAY(T)` | `T[]` | CAST, DDL |
| `ARRAY<T>` | `T[]` | CAST, DDL |
| `MAP(K, V)` | `MAP(K, V)` | Passthrough |
| `VARBINARY` | `BLOB` | CAST, DDL |
| `IPADDRESS` | `VARCHAR` | CAST, DDL |
### Type Mappings: Redshift to DuckDB
| `VARCHAR(MAX)` | `VARCHAR` | CAST, DDL |
| `CHARACTER VARYING(MAX)` | `VARCHAR` | CAST, DDL |
| `NVARCHAR(MAX)` | `VARCHAR` | CAST, DDL |
| `SUPER` | `JSON` | CAST, DDL |
| `VARBINARY` | `BLOB` | CAST, DDL |
| `HLLSKETCH` | Unsupported | |
| `GEOMETRY` | Unsupported | |
| `TIMETZ` | `TIMETZ` | Passthrough |
| `TIMESTAMPTZ` | `TIMESTAMPTZ` | Passthrough |
### Type Mappings: Hive to DuckDB
Hive uses the same type rewrite rules as Trino.
| `ROW(a INT, b VARCHAR)` | `STRUCT(a INT, b VARCHAR)` | CAST, DDL |
| `ARRAY(T)` | `T[]` | CAST, DDL |
| `ARRAY<T>` | `T[]` | CAST, DDL |
| `MAP(K, V)` | `MAP(K, V)` | Passthrough |
| `VARBINARY` | `BLOB` | CAST, DDL |
| `IPADDRESS` | `VARCHAR` | CAST, DDL |
### DDL Support
#### CREATE TABLE
| Column type rewriting | Automatic (all type mappings applied) |
| `CREATE EXTERNAL TABLE ... STORED AS ... LOCATION` | Configurable: `MapToView` or `Error` |
| Iceberg via `TBLPROPERTIES ('table_type'='ICEBERG')` | Configurable: `MapToView` (uses `iceberg_scan()`) or `Error` |
| Iceberg via `WITH (table_type = 'ICEBERG')` | Same as above (Trino syntax) |
| `PARTITIONED BY` | `hive_partitioning = true` added to reader options |
| `ROW FORMAT DELIMITED FIELDS TERMINATED BY` | `delim = '...'` added to `read_csv` options |
| `ROW FORMAT DELIMITED ESCAPED BY` | `escape = '...'` added to `read_csv` options |
| `ROW FORMAT DELIMITED LINES TERMINATED BY` | `new_line = '...'` added to `read_csv` options |
| `ROW FORMAT DELIMITED NULL DEFINED AS` | `nullstr = '...'` added to `read_csv` options |
| `ROW FORMAT SERDE 'class'` | Reader function inferred from SerDe class name |
#### External Table Format Mapping
| `PARQUET` | `read_parquet()` |
| `ORC` | `read_parquet()` |
| `TEXTFILE` | `read_csv()` |
| `JSONFILE` | `read_json()` |
| `AVRO` | Unsupported |
| `SEQUENCEFILE` | Unsupported |
| `RCFILE` | Unsupported |
| (none specified) | `read_parquet()` (default) |
#### SerDe Class Mapping
When no `STORED AS` is present, the reader function is inferred from `ROW FORMAT SERDE`:
| `ParquetHiveSerDe` | `read_parquet()` |
| `OrcSerde` | `read_parquet()` |
| `JsonSerDe` (Hive or OpenX) | `read_json()` |
| `OpenCSVSerde` | `read_csv()` |
| `LazySimpleSerDe` | `read_csv()` |
| `RegexSerDe` | Unsupported |
| Unknown classes | `Error` by default; override via `serde_class_resolver` |
The built-in mapping is substring-based and case-insensitive. For classes not listed above, supply a `SerdeClassResolver` in `TranspileOptions` (see the Library section for an example). The resolver is called first; returning `None` falls through to the built-in logic.
#### ALTER TABLE
| `ADD COLUMN` | Column type rewritten |
| `ALTER COLUMN ... SET DATA TYPE` | Data type rewritten |
| Other operations | Passthrough |
### DML Support
| `INSERT INTO ... SELECT` | Passthrough (expressions rewritten) |
| `INSERT INTO ... VALUES` | Passthrough |
| `UPDATE ... SET ... FROM` (Redshift) | Passthrough (DuckDB supports this) |
| `DELETE ... USING` (Redshift) | Passthrough (DuckDB supports this) |
| `MERGE` | Passthrough |
| `COPY FROM` (Redshift) | Configurable: `MapToInsert` (uses `read_parquet`/`read_csv`/`read_json`) or `Error` |
### SHOW Commands
| `SHOW TABLES` | Passthrough |
| `SHOW DATABASES` | Passthrough |
| `SHOW SCHEMAS` | Passthrough |
| `SHOW COLUMNS FROM t` | Passthrough |
| `SHOW VIEWS` | Passthrough |
| `SHOW CREATE TABLE t` | Emulated via `information_schema.columns` (reconstructs DDL) |
| `SHOW CREATE VIEW v` | Emulated via `duckdb_views()` (retrieves view SQL) |
| `SHOW variable` | `SELECT current_setting('variable')` |
| `SHOW FUNCTIONS` | `SELECT ... FROM information_schema.routines` |
### UNNEST and Lateral Joins
| `CROSS JOIN UNNEST(arr) AS t(x)` | Passthrough |
| `CROSS JOIN UNNEST(arr) WITH ORDINALITY AS t(x, n)` | Passthrough |
| `LATERAL VIEW explode(arr) t AS x` | `CROSS JOIN UNNEST(arr) AS t(x)` |
| `LATERAL VIEW posexplode(arr) t AS x` | `CROSS JOIN UNNEST(arr) AS t(x)` |
| `CROSS JOIN LATERAL (subquery)` | Passthrough |
### Parameterized Queries
Parameterized queries (prepared statement placeholders) are passed through to DuckDB unchanged.
| Trino | `?`, `$1` | `SELECT * FROM t WHERE x = ?` |
| Redshift | `$1` | `SELECT * FROM t WHERE x = $1` |
| Hive | `?`, `$1` | `SELECT * FROM t WHERE x = ?` |
DuckDB uses `$1`-style positional parameters natively. The `?` style is also accepted by DuckDB in its client APIs.
## Configuration
### TranspileOptions
| `external_table` | `MapToView`, `Error` | `Error` | How to handle `CREATE EXTERNAL TABLE` |
| `iceberg_table` | `MapToView`, `Error` | `Error` | How to handle Iceberg tables (detected via TBLPROPERTIES) |
| `copy` | `MapToInsert`, `Error` | `Error` | How to handle Redshift `COPY FROM` |
| `serde_class_resolver` | `Some(SerdeClassResolver)`, `None` | `None` | Custom resolver for `ROW FORMAT SERDE` class names not covered by the built-in mapping. Return `Some(reader_fn)` to override, `None` to fall through. |
## Project Structure
```
src/
lib.rs Public API, TranspileOptions
error.rs Error types
main.rs CLI entry point
transpiler/
mod.rs Transpiler trait
rewrite.rs ExprRewriter (VisitorMut-based AST walker)
dialect/
mod.rs SourceDialect enum
trino.rs Trino transpiler
redshift.rs Redshift transpiler
hive.rs Hive transpiler
transforms/
mod.rs Module re-exports
types.rs Data type rewriting
functions.rs Function name/signature mapping
format_strings.rs Format string conversion (PG/Java → strftime)
ddl.rs CREATE TABLE, ALTER TABLE, external/iceberg tables
dml.rs INSERT, UPDATE, DELETE, MERGE
show.rs SHOW command translation
unnest.rs UNNEST syntax normalization
lateral.rs LATERAL VIEW to CROSS JOIN UNNEST
tests/
common/mod.rs Test helpers
integration.rs End-to-end tests
duckdb_integration.rs DuckDB execution tests
examples/
basic.rs Function, type, and syntax rewrites
migration.rs External tables, Iceberg, COPY with TranspileOptions
multi_statement.rs Multi-statement ETL script transpilation
serde_resolver.rs Custom SerDe class resolver with built-in fallthrough
```
## Architecture and Design
### Transpilation Pipeline
papera uses a two-stage AST transformation pipeline:
1. **Parse** source SQL with the dialect-specific parser from sqlparser-rs.
2. **Statement-level transforms** restructure top-level `Statement` variants (e.g., converting `CREATE EXTERNAL TABLE` into `CREATE VIEW`, or rewriting `SHOW CREATE TABLE` into catalog queries).
3. **Expression-level rewrites** via `ExprRewriter` (a `VisitorMut`-based AST walker) handle cross-cutting concerns such as function renaming, type casting, and table-factor normalization.
4. **Emit** DuckDB-compatible SQL from the rewritten AST.
This split is intentional: statement handlers own structural changes that may replace one statement kind with another, while `ExprRewriter` handles expression-level rewrites that apply uniformly across statement types.
### Library-First Design
The crate is designed as a library first. The CLI binary is feature-gated behind `cli` and is not built by default. Internal rewrite machinery under `src/transforms` is `pub(crate)`, keeping the stable public API surface small: `transpile`, `transpile_with_options`, dialect selection, options, and error types.
### Opt-In Semantics for Risky Conversions
Features that can silently change semantics or storage assumptions are controlled by `TranspileOptions` and default to `Error`. External-table-to-view and Iceberg-table-to-view conversions are examples: they are useful but alter the storage model, so callers must explicitly opt in.
### Parser as Feature Boundary
papera's feature coverage is bounded not only by the rewrite logic but also by what sqlparser-rs can parse and represent. For example:
- DDL column `DataType` nodes are not visited by `VisitorMut`, so `CREATE TABLE` column types must be rewritten in statement handlers rather than in the expression walker.
- Trino `ROW(a INT, b VARCHAR)` is exposed as flattened custom type data under `GenericDialect`, making nested `ROW` handling fragile.
- Some source syntax cannot be supported cleanly until the upstream parser exposes a usable AST for it.
### Rewrite Strategy
Function rewrites are classified by complexity:
- **Rename**: simple name substitution (e.g., `approx_distinct` to `approx_count_distinct`).
- **RenameReorder**: same function with reordered arguments (e.g., `CHARINDEX(sub, str)` to `strpos(str, sub)`).
- **Custom**: the rewrite must produce a different AST shape entirely (e.g., `NVL2(e, a, b)` becomes a `CASE WHEN` expression, bitwise functions become infix operators).
Declarative mappings are preferred where possible, with custom rewrites reserved for cases where DuckDB requires a structurally different expression.
### Compatibility Model
papera targets DuckDB execution correctness, not just syntactically valid output. Some mappings are approximations rather than exact semantic matches (e.g., `url_extract_*` functions use regex approximations). String-level rewrite success alone is not considered sufficient evidence of compatibility, which is why the test suite includes DuckDB execution tests alongside string-comparison tests.
## Known Limitations
### Nested ROW types (sqlparser-rs 0.61)
Nested `ROW` types such as `ROW(x BIGINT, y ROW(i DOUBLE, j DOUBLE))` fail to parse in sqlparser-rs 0.61. The root cause is `parse_optional_type_modifiers()` in the parser, which cannot handle nested parentheses. This affects all dialects (including `HiveDialect`). Flat `ROW(a INT, b VARCHAR)` works correctly.
### ARRAY(T) vs ARRAY\<T\>
`ARRAY(T)` (Trino parenthesis syntax) is rewritten to `T[]`, but type inference inside `ARRAY(T)` depends on the parser recognizing the inner type. `ARRAY<T>` and `T[]` are fully supported.
### Approximate mappings
Some functions are approximations rather than exact semantic matches. For example, `url_extract_*` functions use regex-based approximations. Always validate output against DuckDB execution for compatibility-sensitive queries.
### Redshift COPY options
When `copy` is set to `MapToInsert`, Redshift-specific options such as `IAM_ROLE`, `IGNOREHEADER`, and `GZIP` are silently dropped. The generated `INSERT INTO ... SELECT * FROM read_parquet/read_csv/read_json` reflects format and location only.
## Building
```sh
cargo build
cargo test
```
## License
MIT License. Copyright (c) 2026 Moriyoshi Koizumi. See [LICENSE](./LICENSE) for details.