samkhya-postgres 1.0.0

PostgreSQL adapter for samkhya — portable cardinality correction hooks
Documentation
# samkhya-postgres

PostgreSQL extension adapter for [samkhya](../) — portable
cardinality correction for embedded analytical engines.

This crate ships a [pgrx](https://github.com/pgcentralfoundation/pgrx)
based PostgreSQL extension that exposes samkhya's portable sketch and
Puffin sidecar primitives to SQL.

## Build modes

The crate has two build modes, controlled by the `pg_extension` Cargo
feature:

- **Default (`pg_extension` off)**: empty `rlib`. Compiles in seconds
  without PostgreSQL development headers. This is what
  `cargo check --workspace` builds in CI. Suitable for downstream
  crates that want `samkhya-postgres` in their dependency graph
  without forcing every consumer to install `libpq-dev`.
- **`pg_extension` on**: pulls in pgrx and compiles the real
  PostgreSQL loadable module. Requires PostgreSQL development headers
  (`postgresql-server-dev-NN` on Debian/Ubuntu, `postgresql-devel` on
  RHEL/Fedora) and the matching `cargo-pgrx` toolchain.

## Quickstart (extension build)

```bash
# 1. Install the pgrx CLI.
cargo install --locked cargo-pgrx

# 2. One-time pgrx init — downloads and builds the supported PG
#    majors into ~/.pgrx (skip the ones you don't need with --pg16
#    etc.). Pick the version you plan to develop against.
cargo pgrx init

# 3. From the workspace root, run the extension inside an ephemeral
#    PostgreSQL 16 (or 17) instance with psql attached.
cargo pgrx run pg16 --features pg_extension,pg16 \
    --package samkhya-postgres

# Inside psql:
#   CREATE EXTENSION samkhya_postgres;
```

## SQL surface

### `samkhya_hll_count(input anyarray) -> bigint`

Builds a samkhya `HllSketch` (precision 14) from the input array and
returns its estimated distinct-element count.

```sql
SELECT samkhya_hll_count(array_agg(id)) FROM foo;
SELECT samkhya_hll_count(ARRAY[1, 2, 2, 3, 3, 3]::int[]);
```

### `samkhya_puffin_inspect(path text) -> jsonb`

Opens an Iceberg [Puffin](https://iceberg.apache.org/puffin-spec/)
sidecar file on the server filesystem and returns per-blob metadata
(`kind`, `fields`, `offset`, `length`, `compression_codec`).

```sql
SELECT samkhya_puffin_inspect('/srv/iceberg/sketches/orders.puffin');
```

Output shape:

```json
{
  "blobs": [
    {
      "kind": "samkhya.hll-v1",
      "fields": [7],
      "offset": 4,
      "length": 16384,
      "compression_codec": null
    }
  ]
}
```

## Scope

This is the v1.0 scaffold. It establishes the extension surface,
crate layout, and pgrx feature gating. The operator-side cardinality
hook (replacing `get_relation_info_hook` so the planner picks up
samkhya's corrected row estimates without per-query SQL changes) is a
**v1.1** target.

## Testing

```bash
# Default-feature check (no PG headers required).
cargo check -p samkhya-postgres

# Extension-side unit tests (requires cargo-pgrx).
cargo pgrx test pg16 --features pg_extension,pg16,pg_test \
    --package samkhya-postgres
```

## License

Apache-2.0, inherited from the workspace.