polyglot-sql-function-catalogs 0.1.10

Optional dialect function catalogs for polyglot-sql semantic validation
Documentation
# polyglot-sql-function-catalogs

Optional dialect function-catalog data for `polyglot-sql` semantic validation.

This crate intentionally does not depend on `polyglot-sql`. It exposes feature-gated
function lists through a small sink interface so the core crate can pull them in
at compile time without dependency cycles.

## Features

- `dialect-clickhouse`: include ClickHouse function signatures.
- `dialect-duckdb`: include DuckDB function signatures.
- `all-dialects`: currently aliases all available dialect features.

## Performance Model

- This crate emits static function lists into a caller-provided sink.
- No runtime globals are created in this crate.
- Compile-time feature flags decide which dialect lists are built.
- In `polyglot-sql`, enabled lists are loaded once into a `LazyLock` catalog.

## Usage

With `polyglot-sql` compile-time wiring (recommended):

```toml
[dependencies]
polyglot-sql = { version = "...", features = ["function-catalog-clickhouse"] }
```

Then run schema validation with type checks:

```rust
use polyglot_sql::{SchemaValidationOptions, ValidationSchema, validate_with_schema, DialectType};

let schema = ValidationSchema { tables: vec![], strict: Some(true) };
let options = SchemaValidationOptions {
    check_types: true,
    ..Default::default()
};

let result = validate_with_schema(
    "SELECT IF(1, 2, 3)",
    DialectType::ClickHouse,
    &schema,
    &options,
);
assert!(result.valid);
```

The core crate auto-injects embedded catalogs when:

- `check_types` is enabled, and
- no custom `function_catalog` is provided in options.

## What Catalog Validation Checks

Function catalogs are used during schema/type validation (not parser syntax validation).

When `check_types` is enabled in `SchemaValidationOptions`, core validation uses the catalog to check:

- **Function name presence per dialect**:
  - Unknown function name -> `E202` (`E_UNKNOWN_FUNCTION`)
- **Function arity / overloads**:
  - Name exists but argument count matches no signature -> `E203` (`E_INVALID_FUNCTION_ARITY`)

Catalog entries currently define:

- `min_arity`
- `max_arity` (`None` means variadic)
- multiple overloads per function name
- dialect-level casing behavior + optional per-function casing override

Catalog entries do **not** currently define:

- per-argument data types
- return types
- coercion rules
- named/optional parameter semantics

Advanced manual integration (custom sink) is also available via:

- `CatalogSink`
- `register_enabled_catalogs`
- `FunctionSignature`
- `FunctionNameCase`

## Feature Mapping Across Crates

This crate's features are consumed by `polyglot-sql` compile-time features:

- `polyglot-sql/function-catalog-clickhouse` -> `polyglot-sql-function-catalogs/dialect-clickhouse`
- `polyglot-sql/function-catalog-duckdb` -> `polyglot-sql-function-catalogs/dialect-duckdb`
- `polyglot-sql/function-catalog-all-dialects` -> `polyglot-sql-function-catalogs/all-dialects`

Bindings forward those same core features:

- `polyglot-sql-wasm/function-catalog-clickhouse`
- `polyglot-sql-wasm/function-catalog-duckdb`
- `polyglot-sql-wasm/function-catalog-all-dialects`
- `polyglot-sql-ffi/function-catalog-clickhouse`
- `polyglot-sql-ffi/function-catalog-duckdb`
- `polyglot-sql-ffi/function-catalog-all-dialects`
- `polyglot-sql-python/function-catalog-clickhouse`
- `polyglot-sql-python/function-catalog-duckdb`
- `polyglot-sql-python/function-catalog-all-dialects`

Build examples:

```bash
cargo check -p polyglot-sql --features function-catalog-clickhouse
cargo check -p polyglot-sql --features function-catalog-duckdb
cargo check -p polyglot-sql-wasm --features function-catalog-clickhouse
cargo check -p polyglot-sql-wasm --features function-catalog-duckdb
cargo check -p polyglot-sql-ffi --features function-catalog-clickhouse
cargo check -p polyglot-sql-ffi --features function-catalog-duckdb
cargo check -p polyglot-sql-python --features function-catalog-clickhouse
cargo check -p polyglot-sql-python --features function-catalog-duckdb
```

## Adding A New Dialect Function List

This crate is intentionally feature-gated so large function datasets do not inflate every binary.

### 1. Add a new dialect source file

Create `src/<dialect>.rs` and expose `register<S: CatalogSink>(sink: &mut S)`.

```rust
use crate::{CatalogSink, FunctionNameCase, FunctionSignature};

pub(crate) fn register<S: CatalogSink>(sink: &mut S) {
    let d = "postgresql";

    // Dialect-level default casing policy.
    sink.set_dialect_name_case(d, FunctionNameCase::Insensitive);

    // Optional function-level casing override.
    // sink.set_function_name_case(d, "SpecialFn", FunctionNameCase::Sensitive);

    sink.register(d, "abs", vec![FunctionSignature::exact(1)]);
    sink.register(d, "coalesce", vec![FunctionSignature::variadic(1)]);
    sink.register(d, "date_trunc", vec![FunctionSignature::exact(2)]);
}
```

### 2. Wire the module in `src/lib.rs`

```rust
#[cfg(feature = "dialect-postgresql")]
mod postgresql;

pub fn register_enabled_catalogs<S: CatalogSink>(sink: &mut S) {
    #[cfg(feature = "dialect-postgresql")]
    postgresql::register(sink);
}
```

### 3. Add feature flags in `Cargo.toml`

```toml
[features]
default = []
dialect-clickhouse = []
dialect-duckdb = []
all-dialects = ["dialect-clickhouse", "dialect-duckdb"]
```

### 4. (Optional) Add extraction tooling

Put source-specific extraction scripts in:

- `tools/<dialect>/extract_functions.py`

Current example path:

- `tools/clickhouse/extract_functions.py`
- `tools/duckdb/extract_functions.py`

ClickHouse extraction command (uses `chdb` via `uv run`, no local install required):

```bash
uv run --with chdb python crates/polyglot-sql-function-catalogs/tools/clickhouse/extract_functions.py \
  --output crates/polyglot-sql-function-catalogs/src/clickhouse.rs
```

DuckDB extraction command (requires Python package `duckdb`, installed on demand):

```bash
uv run --with duckdb python crates/polyglot-sql-function-catalogs/tools/duckdb/extract_functions.py \
  --output crates/polyglot-sql-function-catalogs/src/duckdb.rs
```

Useful optional flags:

- `--exclude-internal`: drop rows where `internal = true`
- `--function-type <type>` (repeatable): filter to specific `function_type` values

## How This Crate Is Wired To Other Crates

### `polyglot-sql` (core)

- Core has an optional dependency on this crate.
- Core features:
  - `function-catalog-clickhouse`
  - `function-catalog-all-dialects`
- When one of these features is enabled, core builds an embedded catalog once
  and auto-uses it for schema validation type checks (unless caller provides a custom catalog).
- Runtime override hook still exists via `SchemaValidationOptions.function_catalog`.

### `polyglot-sql-wasm`

- WASM forwards function-catalog features to `polyglot-sql`:
  - `function-catalog-clickhouse`
  - `function-catalog-all-dialects`
- Uses core schema validation, so catalog checks apply when `check_types` is enabled.
- If disabled, behavior is unchanged.

### Other bindings (`ffi`, `python`, `sdk`)

- `ffi` and `python` can enable core catalog features at compile time via pass-through features.
- Today, their exposed `validate` APIs are syntax-only, so catalog checks are effectively dormant
  until schema-aware validation APIs are exposed there.
- `sdk` consumes WASM and therefore follows WASM feature behavior.
- No direct dependency on this crate is required in those bindings.