Skip to main content

canonicalize_identifiers

Function canonicalize_identifiers 

Source
pub fn canonicalize_identifiers(
    sql: &str,
    tables: &HashMap<String, String>,
    columns: &HashMap<String, String>,
) -> String
Expand description

Rewrite references to registered tables and their columns so they match case-insensitively, the way DuckDB does.

DataFusion lowercases unquoted identifiers by default, so a query like SELECT State FROM accidents is looked up as state and fails against a case-sensitive Parquet column literally named State. Rather than disable normalization globally (which would also make aliases and CTE names case-sensitive), we rewrite only the identifiers that name a known dataset or column into their canonical casing, quoted. Quoted identifiers bypass the engine’s lowercasing and match the stored name exactly, while every other identifier (aliases, CTE names, …) is left untouched so the engine’s normal case-insensitive handling still applies.

tables and columns map a lowercased name to its canonical spelling. On any parse failure the input is returned unchanged so the backend can surface a meaningful error.