papera 0.1.0

A SQL compatibility layer that transpiles Trino and Redshift SQL to DuckDB SQL
Documentation

Papera

A SQL compatibility layer that transpiles Trino, Redshift, and Hive SQL to DuckDB SQL.

papera parses source SQL using sqlparser-rs, applies dialect-specific AST transformations, and emits DuckDB-compatible SQL. It handles function name differences, type mappings, DDL syntax variations, and more.

Installation

As a Library

Add to your Cargo.toml:

[dependencies]
papera = "0.1"

CLI

The CLI binary is feature-gated and not built by default:

cargo build --features cli
# or install globally
cargo install --path . --features cli

Usage

CLI

# Pipe SQL through papera
echo "SELECT NVL(a, b) FROM t" | papera redshift
# Output: SELECT coalesce(a, b) FROM t

echo "SELECT approx_distinct(col) FROM t" | papera trino
# Output: SELECT approx_count_distinct(col) FROM t
Usage: papera <trino|redshift|hive>
  Reads SQL from stdin and writes DuckDB-compatible SQL to stdout.

Library

use papera::{
    transpile, transpile_with_options, SourceDialect, TranspileOptions,
    ExternalTableBehavior, IcebergTableBehavior, CopyBehavior,
};

// Simple usage
let sql = "SELECT NVL(a, b) FROM t";
let result = transpile(sql, SourceDialect::Redshift).unwrap();
assert_eq!(result, "SELECT coalesce(a, b) FROM t");

// With options (e.g., convert external tables to views)
let sql = "CREATE EXTERNAL TABLE t (a INT) STORED AS PARQUET LOCATION 's3://bucket/path'";
let opts = TranspileOptions {
    external_table: ExternalTableBehavior::MapToView,
    ..Default::default()
};
let result = transpile_with_options(sql, SourceDialect::Redshift, &opts).unwrap();
// Output: CREATE VIEW t (a) AS SELECT * FROM read_parquet('s3://bucket/path')

// Migration mode: opt into all conversions
let opts = TranspileOptions {
    external_table: ExternalTableBehavior::MapToView,
    iceberg_table: IcebergTableBehavior::MapToView,
    copy: CopyBehavior::MapToInsert,
    ..Default::default()
};

Custom SerDe class mappings for classes not covered by the built-in resolver:

use papera::{SerdeClassResolver, TranspileOptions, ExternalTableBehavior};

let opts = TranspileOptions {
    external_table: ExternalTableBehavior::MapToView,
    serde_class_resolver: Some(SerdeClassResolver::new(|class| {
        match class {
            c if c.eq_ignore_ascii_case("com.example.MyParquetSerde") => Some("read_parquet".to_string()),
            c if c.eq_ignore_ascii_case("com.example.MyJsonSerde")    => Some("read_json".to_string()),
            _ => None, // fall through to built-in logic
        }
    })),
    ..Default::default()
};

Multi-Statement SQL

Both transpile and transpile_with_options accept multi-statement SQL. Statements are parsed together and emitted joined with ;\n:

let script = "SELECT NVL(a, b) FROM t; SELECT GETDATE()";
let result = transpile(script, SourceDialect::Redshift).unwrap();
// result: "SELECT coalesce(a, b) FROM t;\nSELECT current_timestamp()"

See cargo run --example multi_statement for a full ETL script example.

Error Handling

transpile and transpile_with_options return papera::Result<String>, where papera::Error has two variants:

pub enum Error {
    /// The source SQL could not be parsed by sqlparser-rs.
    Parse(sqlparser::parser::ParserError),
    /// The SQL uses a feature that cannot be transpiled to DuckDB
    /// (e.g., an unsupported type or a conversion that was not opted into).
    Unsupported(String),
}

Error::Unsupported is returned for:

  • CREATE EXTERNAL TABLE when external_table is Error (the default)
  • Iceberg tables when iceberg_table is Error (the default)
  • COPY FROM when copy is Error (the default)
  • Types with no DuckDB equivalent ( HLLSKETCH, GEOMETRY )

See examples/ for more complete usage patterns (cargo run --example basic, cargo run --example migration, cargo run --example serde_resolver).

Feature Coverage

Supported Dialects

Dialect Parser Notes
Trino GenericDialect Also handles Hive-style DDL (STORED AS, TBLPROPERTIES, etc.)
Redshift RedshiftSqlDialect Includes Redshift Spectrum external tables
Hive HiveDialect ROW FORMAT, SERDE, PARTITIONED BY support

Function Mappings: Trino to DuckDB

Aggregate

Trino DuckDB Notes
approx_distinct(x) approx_count_distinct(x)
arbitrary(x) any_value(x)
approx_percentile(x, p) approx_quantile(x, p)
map_agg(k, v) map(list(k), list(v))

Date / Time

Trino DuckDB Notes
date_parse(s, fmt) strptime(s, fmt) Java format strings converted
format_datetime(ts, fmt) strftime(ts, fmt) Java format strings converted
date_format(ts, fmt) strftime(ts, fmt) Java format strings converted
at_timezone(ts, tz) ts AT TIME ZONE tz
with_timezone(ts, tz) ts AT TIME ZONE tz
parse_datetime(s, fmt) strptime(s, fmt) Java format strings converted
to_unixtime(ts) epoch(ts)
current_timezone() current_setting('TimeZone')
from_unixtime(t) to_timestamp(t)
date_diff(unit, t1, t2) date_diff(unit, t1, t2)
date_add(unit, n, ts) date_add(unit, n, ts)
day_of_week(d) dayofweek(d)
day_of_year(d) dayofyear(d)
week_of_year(d) weekofyear(d)
year_of_week(d) yearofweek(d)

String

Trino DuckDB Notes
split(s, del) str_split(s, del)
levenshtein_distance(a, b) levenshtein(a, b)
regexp_like(s, p) regexp_matches(s, p)
regexp_extract(s, p[, g]) regexp_extract(s, p[, g])
regexp_replace(s, p, r) regexp_replace(s, p, r)
strpos(s, sub) strpos(s, sub)
length(s) length(s)
reverse(s) reverse(s)
lpad(s, n, c) lpad(s, n, c)
rpad(s, n, c) rpad(s, n, c)
chr(n) chr(n)
codepoint(c) unicode(c)
from_hex(s) unhex(s)
to_utf8(s) encode(s) Returns BLOB
from_utf8(b) decode(b)
url_extract_host(url) regexp_extract(url, ...) Approximation via regex
url_extract_path(url) regexp_extract(url, ...) Approximation via regex
url_extract_protocol(url) regexp_extract(url, ...) Approximation via regex
url_extract_query(url) regexp_extract(url, ...) Approximation via regex
url_extract_fragment(url) regexp_extract(url, ...) Approximation via regex
url_extract_port(url) regexp_extract(url, ...) Approximation via regex

Array / Map

Trino DuckDB Notes
transform(arr, fn) list_transform(arr, fn)
sequence(start, stop) generate_series(start, stop)
element_at(arr, i) list_extract(arr, i)
cardinality(x) len(x)
array_join(arr, sep) array_to_string(arr, sep)
reduce(arr, ...) list_reduce(arr, ...)
filter(arr, fn) list_filter(arr, fn)
contains(arr, x) list_contains(arr, x)
zip(a, b) list_zip(a, b)
flatten(arr) flatten(arr)
slice(arr, ...) list_slice(arr, ...)
array_distinct(arr) list_distinct(arr)
array_sort(arr) list_sort(arr)
array_max(arr) list_max(arr)
array_min(arr) list_min(arr)
array_position(arr, x) list_position(arr, x)
array_remove(arr, x) list_filter(arr, x) Approximate
array_intersect(a, b) list_intersect(a, b)
array_concat(a, b) list_concat(a, b)
array_except(a, b) list_except(a, b)
array_union(a, b) list_distinct(list_concat(a, b))
arrays_overlap(a, b) len(list_intersect(a, b)) > 0
array_has(arr, x) list_contains(arr, x)
array_has_all(arr, req) len(list_intersect(arr, req)) = len(req)
array_has_any(arr, cands) len(list_intersect(arr, cands)) > 0
array_sum(arr) list_sum(arr)
array_average(arr) list_avg(arr)
map_keys(m) map_keys(m)
map_values(m) map_values(m)
map_concat(m1, m2) map_concat(m1, m2)

JSON

Trino DuckDB Notes
json_extract_scalar(j, p) json_extract_string(j, p)
json_extract(j, p) json_extract(j, p)
json_parse(s) CAST(s AS JSON)
json_format(j) CAST(j AS VARCHAR)
json_array_get(j, idx) json_extract_string(j, '$[idx]') Literal index only
json_array_length(j) json_array_length(j)
json_object_keys(j) json_keys(j)

Math / Numeric

Trino DuckDB Notes
is_nan(x) isnan(x)
is_finite(x) isfinite(x)
is_infinite(x) isinf(x)
nan() CAST('NaN' AS DOUBLE)
infinity() CAST('Infinity' AS DOUBLE)
rand() random()
typeof(x) typeof(x)

Bitwise

Trino DuckDB Notes
bitwise_and(a, b) a & b
bitwise_or(a, b) a | b
bitwise_xor(a, b) a ^ b
bitwise_not(a) ~a
bitwise_left_shift(a, b) a << b
bitwise_right_shift(a, b) a >> b

Function Mappings: Redshift to DuckDB

Date / Time

Redshift DuckDB Notes
GETDATE() current_timestamp()
SYSDATE current_timestamp
DATEADD(part, n, d) d + INTERVAL 'n' part Interval arithmetic
DATEDIFF(part, d1, d2) date_diff('part', d1, d2) Datepart quoted
DATE_TRUNC(part, d) date_trunc(part, d)
CONVERT_TIMEZONE(tz, ts) ts AT TIME ZONE 'tz' 2-arg form
CONVERT_TIMEZONE(src, dst, ts) ts AT TIME ZONE 'src' AT TIME ZONE 'dst' 3-arg form
TO_DATE(s, fmt) CAST(strptime(s, fmt) AS DATE) PG format strings converted
TO_TIMESTAMP(s, fmt) strptime(s, fmt) PG format strings converted
TO_CHAR(ts, fmt) strftime(ts, fmt) PG format strings converted
MONTHS_BETWEEN(d1, d2) datediff('month', d2, d1)
ADD_MONTHS(d, n) d + INTERVAL 'n' MONTH

String

Redshift DuckDB Notes
NVL(a, b) coalesce(a, b)
NVL2(e, a, b) CASE WHEN e IS NOT NULL THEN a ELSE b END
ISNULL(v, r) coalesce(v, r) 2-arg form only
LEN(s) length(s)
LCASE(s) lower(s)
UCASE(s) upper(s)
UPPER(s) upper(s)
LOWER(s) lower(s)
LEFT(s, n) left(s, n)
RIGHT(s, n) right(s, n)
SUBSTRING(s, ...) substring(s, ...)
REPLACE(s, from, to) replace(s, from, to)
BTRIM(s) trim(s)
TRIM(s) trim(s)
CHARINDEX(sub, str) strpos(str, sub) Args swapped
SPACE(n) repeat(' ', n)
REGEXP_SUBSTR(s, p) regexp_extract(s, p)
REGEXP_COUNT(s, p) len(regexp_extract_all(s, p))

Aggregate

Redshift DuckDB Notes
DECODE(e, s1, r1, ..., def) CASE e WHEN s1 THEN r1 ... ELSE def END
LISTAGG(col, sep) string_agg(col, sep)

JSON

Redshift DuckDB Notes
JSON_EXTRACT_PATH_TEXT(j, k1, k2, ...) json_extract_string(j, '$.k1.k2...') Literal keys only
JSON_EXTRACT_ARRAY_ELEMENT_TEXT(j, i) json_extract_string(j, '$[i]') Literal index only
JSON_ARRAY_LENGTH(j) json_array_length(j)
JSON_TYPEOF(j) json_type(j)
JSON_SERIALIZE(j) CAST(j AS VARCHAR)
JSON_DESERIALIZE(s) CAST(s AS JSON)
IS_VALID_JSON(s) json_valid(s)

Array

Redshift DuckDB Notes
ARRAY_CONCAT(a, b) list_concat(a, b)

Crypto / Encoding

Redshift DuckDB Notes
MD5(s) md5(s)
SHA1(s) sha1(s)
SHA2(s, 256) sha256(s) 256-bit only

Emulated

Redshift DuckDB Notes
STRTOL(s, base) CASE base WHEN 16 THEN CAST(('0x' || s) AS BIGINT) WHEN 10 THEN CAST(s AS BIGINT) END Base 10 and 16 only
RATIO_TO_REPORT(col) OVER (...) col / SUM(col) OVER (...) Window clause preserved

Unsupported

Redshift Notes
BPCHARCMP(a, b) No DuckDB equivalent

Type Mappings: Trino to DuckDB

Trino DuckDB Context
ROW(a INT, b VARCHAR) STRUCT(a INT, b VARCHAR) CAST, DDL
ARRAY(T) T[] CAST, DDL
ARRAY<T> T[] CAST, DDL
MAP(K, V) MAP(K, V) Passthrough
VARBINARY BLOB CAST, DDL
IPADDRESS VARCHAR CAST, DDL

Type Mappings: Redshift to DuckDB

Redshift DuckDB Context
VARCHAR(MAX) VARCHAR CAST, DDL
CHARACTER VARYING(MAX) VARCHAR CAST, DDL
NVARCHAR(MAX) VARCHAR CAST, DDL
SUPER JSON CAST, DDL
VARBINARY BLOB CAST, DDL
HLLSKETCH Unsupported
GEOMETRY Unsupported
TIMETZ TIMETZ Passthrough
TIMESTAMPTZ TIMESTAMPTZ Passthrough

Type Mappings: Hive to DuckDB

Hive uses the same type rewrite rules as Trino.

Hive DuckDB Context
ROW(a INT, b VARCHAR) STRUCT(a INT, b VARCHAR) CAST, DDL
ARRAY(T) T[] CAST, DDL
ARRAY<T> T[] CAST, DDL
MAP(K, V) MAP(K, V) Passthrough
VARBINARY BLOB CAST, DDL
IPADDRESS VARCHAR CAST, DDL

DDL Support

CREATE TABLE

Feature Behavior
Column type rewriting Automatic (all type mappings applied)
CREATE EXTERNAL TABLE ... STORED AS ... LOCATION Configurable: MapToView or Error
Iceberg via TBLPROPERTIES ('table_type'='ICEBERG') Configurable: MapToView (uses iceberg_scan()) or Error
Iceberg via WITH (table_type = 'ICEBERG') Same as above (Trino syntax)
PARTITIONED BY hive_partitioning = true added to reader options
ROW FORMAT DELIMITED FIELDS TERMINATED BY delim = '...' added to read_csv options
ROW FORMAT DELIMITED ESCAPED BY escape = '...' added to read_csv options
ROW FORMAT DELIMITED LINES TERMINATED BY new_line = '...' added to read_csv options
ROW FORMAT DELIMITED NULL DEFINED AS nullstr = '...' added to read_csv options
ROW FORMAT SERDE 'class' Reader function inferred from SerDe class name

External Table Format Mapping

STORED AS DuckDB Reader
PARQUET read_parquet()
ORC read_parquet()
TEXTFILE read_csv()
JSONFILE read_json()
AVRO Unsupported
SEQUENCEFILE Unsupported
RCFILE Unsupported
(none specified) read_parquet() (default)

SerDe Class Mapping

When no STORED AS is present, the reader function is inferred from ROW FORMAT SERDE:

SerDe Class DuckDB Reader
ParquetHiveSerDe read_parquet()
OrcSerde read_parquet()
JsonSerDe (Hive or OpenX) read_json()
OpenCSVSerde read_csv()
LazySimpleSerDe read_csv()
RegexSerDe Unsupported
Unknown classes Error by default; override via serde_class_resolver

The built-in mapping is substring-based and case-insensitive. For classes not listed above, supply a SerdeClassResolver in TranspileOptions (see the Library section for an example). The resolver is called first; returning None falls through to the built-in logic.

ALTER TABLE

Operation Behavior
ADD COLUMN Column type rewritten
ALTER COLUMN ... SET DATA TYPE Data type rewritten
Other operations Passthrough

DML Support

Statement Behavior
INSERT INTO ... SELECT Passthrough (expressions rewritten)
INSERT INTO ... VALUES Passthrough
UPDATE ... SET ... FROM (Redshift) Passthrough (DuckDB supports this)
DELETE ... USING (Redshift) Passthrough (DuckDB supports this)
MERGE Passthrough
COPY FROM (Redshift) Configurable: MapToInsert (uses read_parquet/read_csv/read_json) or Error

SHOW Commands

Command DuckDB Output
SHOW TABLES Passthrough
SHOW DATABASES Passthrough
SHOW SCHEMAS Passthrough
SHOW COLUMNS FROM t Passthrough
SHOW VIEWS Passthrough
SHOW CREATE TABLE t Emulated via information_schema.columns (reconstructs DDL)
SHOW CREATE VIEW v Emulated via duckdb_views() (retrieves view SQL)
SHOW variable SELECT current_setting('variable')
SHOW FUNCTIONS SELECT ... FROM information_schema.routines

UNNEST and Lateral Joins

Source Syntax DuckDB Output
CROSS JOIN UNNEST(arr) AS t(x) Passthrough
CROSS JOIN UNNEST(arr) WITH ORDINALITY AS t(x, n) Passthrough
LATERAL VIEW explode(arr) t AS x CROSS JOIN UNNEST(arr) AS t(x)
LATERAL VIEW posexplode(arr) t AS x CROSS JOIN UNNEST(arr) AS t(x)
CROSS JOIN LATERAL (subquery) Passthrough

Parameterized Queries

Parameterized queries (prepared statement placeholders) are passed through to DuckDB unchanged.

Dialect Supported Styles Example
Trino ?, $1 SELECT * FROM t WHERE x = ?
Redshift $1 SELECT * FROM t WHERE x = $1
Hive ?, $1 SELECT * FROM t WHERE x = ?

DuckDB uses $1-style positional parameters natively. The ? style is also accepted by DuckDB in its client APIs.

Configuration

TranspileOptions

Option Values Default Description
external_table MapToView, Error Error How to handle CREATE EXTERNAL TABLE
iceberg_table MapToView, Error Error How to handle Iceberg tables (detected via TBLPROPERTIES)
copy MapToInsert, Error Error How to handle Redshift COPY FROM
serde_class_resolver Some(SerdeClassResolver), None None Custom resolver for ROW FORMAT SERDE class names not covered by the built-in mapping. Return Some(reader_fn) to override, None to fall through.

Project Structure

src/
  lib.rs                  Public API, TranspileOptions
  error.rs                Error types
  main.rs                 CLI entry point
  transpiler/
    mod.rs                Transpiler trait
    rewrite.rs            ExprRewriter (VisitorMut-based AST walker)
  dialect/
    mod.rs                SourceDialect enum
    trino.rs              Trino transpiler
    redshift.rs           Redshift transpiler
    hive.rs               Hive transpiler
  transforms/
    mod.rs                Module re-exports
    types.rs              Data type rewriting
    functions.rs          Function name/signature mapping
    format_strings.rs     Format string conversion (PG/Java → strftime)
    ddl.rs                CREATE TABLE, ALTER TABLE, external/iceberg tables
    dml.rs                INSERT, UPDATE, DELETE, MERGE
    show.rs               SHOW command translation
    unnest.rs             UNNEST syntax normalization
    lateral.rs            LATERAL VIEW to CROSS JOIN UNNEST
tests/
  common/mod.rs           Test helpers
  integration.rs          End-to-end tests
  duckdb_integration.rs   DuckDB execution tests
examples/
  basic.rs                Function, type, and syntax rewrites
  migration.rs            External tables, Iceberg, COPY with TranspileOptions
  multi_statement.rs      Multi-statement ETL script transpilation
  serde_resolver.rs       Custom SerDe class resolver with built-in fallthrough

Architecture and Design

Transpilation Pipeline

papera uses a two-stage AST transformation pipeline:

  1. Parse source SQL with the dialect-specific parser from sqlparser-rs.
  2. Statement-level transforms restructure top-level Statement variants (e.g., converting CREATE EXTERNAL TABLE into CREATE VIEW, or rewriting SHOW CREATE TABLE into catalog queries).
  3. Expression-level rewrites via ExprRewriter (a VisitorMut-based AST walker) handle cross-cutting concerns such as function renaming, type casting, and table-factor normalization.
  4. Emit DuckDB-compatible SQL from the rewritten AST.

This split is intentional: statement handlers own structural changes that may replace one statement kind with another, while ExprRewriter handles expression-level rewrites that apply uniformly across statement types.

Library-First Design

The crate is designed as a library first. The CLI binary is feature-gated behind cli and is not built by default. Internal rewrite machinery under src/transforms is pub(crate), keeping the stable public API surface small: transpile, transpile_with_options, dialect selection, options, and error types.

Opt-In Semantics for Risky Conversions

Features that can silently change semantics or storage assumptions are controlled by TranspileOptions and default to Error. External-table-to-view and Iceberg-table-to-view conversions are examples: they are useful but alter the storage model, so callers must explicitly opt in.

Parser as Feature Boundary

papera's feature coverage is bounded not only by the rewrite logic but also by what sqlparser-rs can parse and represent. For example:

  • DDL column DataType nodes are not visited by VisitorMut, so CREATE TABLE column types must be rewritten in statement handlers rather than in the expression walker.
  • Trino ROW(a INT, b VARCHAR) is exposed as flattened custom type data under GenericDialect, making nested ROW handling fragile.
  • Some source syntax cannot be supported cleanly until the upstream parser exposes a usable AST for it.

Rewrite Strategy

Function rewrites are classified by complexity:

  • Rename: simple name substitution (e.g., approx_distinct to approx_count_distinct).
  • RenameReorder: same function with reordered arguments (e.g., CHARINDEX(sub, str) to strpos(str, sub)).
  • Custom: the rewrite must produce a different AST shape entirely (e.g., NVL2(e, a, b) becomes a CASE WHEN expression, bitwise functions become infix operators).

Declarative mappings are preferred where possible, with custom rewrites reserved for cases where DuckDB requires a structurally different expression.

Compatibility Model

papera targets DuckDB execution correctness, not just syntactically valid output. Some mappings are approximations rather than exact semantic matches (e.g., url_extract_* functions use regex approximations). String-level rewrite success alone is not considered sufficient evidence of compatibility, which is why the test suite includes DuckDB execution tests alongside string-comparison tests.

Known Limitations

Nested ROW types (sqlparser-rs 0.61)

Nested ROW types such as ROW(x BIGINT, y ROW(i DOUBLE, j DOUBLE)) fail to parse in sqlparser-rs 0.61. The root cause is parse_optional_type_modifiers() in the parser, which cannot handle nested parentheses. This affects all dialects (including HiveDialect). Flat ROW(a INT, b VARCHAR) works correctly.

ARRAY(T) vs ARRAY<T>

ARRAY(T) (Trino parenthesis syntax) is rewritten to T[], but type inference inside ARRAY(T) depends on the parser recognizing the inner type. ARRAY<T> and T[] are fully supported.

Approximate mappings

Some functions are approximations rather than exact semantic matches. For example, url_extract_* functions use regex-based approximations. Always validate output against DuckDB execution for compatibility-sensitive queries.

Redshift COPY options

When copy is set to MapToInsert, Redshift-specific options such as IAM_ROLE, IGNOREHEADER, and GZIP are silently dropped. The generated INSERT INTO ... SELECT * FROM read_parquet/read_csv/read_json reflects format and location only.

Building

cargo build
cargo test

License

MIT License. Copyright (c) 2026 Moriyoshi Koizumi. See LICENSE for details.