pg_replica 0.3.0

Consensus-driven failover for PostgreSQL (Raft control plane)
# Node.js cluster client (napi-rs) — build plan

A native Node.js addon, written in Rust with **napi-rs**, that gives JS/TS an efficient
connection **pool over the N-node pg_replica cluster** and follows the primary
automatically. The host-racing, primary detection, pooling, and failover recovery live in
Rust — JS never loops over connections.

## Why this shape

- **Not WASM** — WASM has no raw TCP; the PG wire protocol needs a socket. A native addon
  loads into Node and uses real sockets, so `tokio-postgres` works unmodified.
- **Not JS probe-loop** — pure-JS `pg` is single-host ([#1470]https://github.com/brianc/node-postgres/issues/1470),
  and the libpq path can't be reached cleanly through `pg`'s parser. We push that logic
  down into Rust instead of hand-rolling it in JS (see [CLIENT.md]CLIENT.md).
- **Reuse what's verified**`tokio-postgres` already does multi-host +
  `target_session_attrs=read-write` (proven in `packages/failover-probe`,
  `scripts/test-m6-routing.sh`). The addon reuses that exact `Config`.

## Core idea (why this is small)

A `tokio-postgres` `Config` with the host list + `target_session_attrs=read-write` lands
each connection on the current primary by itself. Wrap that `Config` in an async pool
(`bb8-postgres`) and you get an N-node, primary-following pool with
almost no custom routing code: every pooled connection resolves the primary at connect
time, and broken connections (after a failover) are replaced by fresh ones that re-resolve
to the new primary. The pool crate also spawns each `Connection` future for us.

The one non-trivial case to handle explicitly: a primary that is **fenced read-only
without dropping its sessions** (pg_replica sets `default_transaction_read_only=on`).
An already-open pooled connection to that node stays connected but can no longer write.
So the pool needs a **checkout validation** that confirms the connection is still on a
writable primary (`SHOW transaction_read_only` → `off`) and evicts it otherwise.

## Public API

```ts
import { createPool } from 'hyperiondb-client'

const pool = createPool({
  hosts: ['10.0.0.1', '10.0.0.2', '10.0.0.3'],
  port: 5432,
  user: 'app',
  password: '…',
  database: 'weido',
  mode: 'read-write',
  poolSize: 10,
  connectTimeoutMs: 2000,
})

const rows = await pool.query<{ id: number }>('select id from t where x = $1', [42])

await pool.transaction(async (tx) => {
  await tx.query('insert into t (x) values ($1)', [1])
  await tx.query('update t set x = x + 1 where x = $1', [1])
})

await pool.end()
```

A separate `mode: 'read-only'` (or `'prefer-standby'` on PG 14+) pool routes read-heavy
work (ParadeDB search) to standbys.

## Milestones

### C1 — Core pool
- [x] Reuse the `failover-probe` `Config` (multi-host, `target_session_attrs=read-write`,
      `connect_timeout`).
- [x] `createPool(opts)``#[napi]` class holding a `deadpool`/`bb8` pool.
- [x] `pool.query(sql, params)` async (`#[napi]` → JS Promise) returning rows as JS objects
      keyed by column name.
- [x] `pool.end()` graceful drain.

### C2 — Type marshalling
- [x] Results: PG `Row` → JS. Mappings: `bool`→boolean, `int2/4`→number, `int8`**BigInt**,
      `oid`→number, `float4/8`→number, `numeric`**string**, `text`/`varchar`/`bpchar`/`name`      string, `uuid`→string, `bytea`**Buffer**, `json`/`jsonb`→parsed value,
      `timestamptz`/`timestamp`/`date`/`time`**ISO 8601 string**, and arrays of all the above
      → JS arrays. Custom binary `numeric`→string decoder (arbitrary precision, no float).
- [x] Params: JS values → `tokio_postgres::types::ToSql` (null, boolean, number, **BigInt**,
      string, **Buffer**→bytea, **Date**→timestamptz, array→pg array when the column is an
      array type else jsonb, object→jsonb). Integer/float targets coerced by column type.
- [x] **Decision:** `int8`/`bigint`**BigInt** (lossless 64-bit); `numeric`**string**
      (arbitrary precision). `timestamptz` → ISO 8601 string; `bytea` → Buffer.

### C3 — Primary affinity & failover
- [x] Checkout validation: a `deadpool` `pre_recycle` hook runs `SHOW transaction_read_only`
      and evicts the connection when it is `on` (read-only-fence-without-disconnect window).
      Write pool only; fresh connections are already validated by `target_session_attrs`.
- [x] Recycle on connection error; new connections re-resolve the primary (the hook doubles
      as a liveness check — a failed `SHOW` evicts the dead connection).
- [x] Retry/backoff (50ms→500ms, capped) bounded by `acquireTimeoutMs` (default 5000),
      surfacing `no writable primary available after <ms>ms` as a typed error.
- [x] `mode: 'read-write' | 'read-only' | 'prefer-standby' | 'any'`. `read-only`      `target_session_attrs=read-only` (lands on standbys); `prefer-standby`/`any`      `target_session_attrs=any` + random host load-balancing. (tokio-postgres 0.7.x has no
      server-side standby *preference*, so `prefer-standby` spreads across all reachable nodes.)

### C4 — Ergonomics
- [x] Transactions: native `pool.begin() -> Transaction {query, commit, rollback}` (one
      dedicated connection held in an `Arc<Mutex<Option<Client>>>`), plus a `pool.transaction(cb)`
      helper (auto `BEGIN`/`COMMIT`, `ROLLBACK` on throw) in the JS wrapper.
- [x] Prepared statements: query paths use deadpool `prepare_cached`, so repeated SQL reuses
      a server-side named statement per connection. (Pipelining is inherent to tokio-postgres
      for concurrent queries on a connection.)
- [x] Query cancellation: `query(sql, params, { timeoutMs, signal })`. Both a timeout and an
      `AbortSignal` trip the connection's `tokio_postgres` `cancel_token` (server-side cancel).
      The `!Send` `Rc`-based `AbortSignal` is bridged on the JS thread (`on_abort` → a `Send`
      `Notify`) so the async query can `select!` on it.
- [x] Error mapping: PG errors carry the 5-char `SQLSTATE` on JS `err.code` (native formats
      `[SQLSTATE xxxxx] msg`; the JS wrapper parses it onto `.code` and cleans the message).
- [x] Hand-checked `client.d.ts` (precise `PoolOptions`/`Param`/`Row`/`QueryOptions`/`Pool`/
      `Transaction` types) is the published `types`; the napi-generated `index.d.ts` stays as
      the internal native binding. Architecture is now native core (`index.js`/`.node`) + a thin
      JS ergonomic layer (`client.js`) that adds `.code`, `transaction(cb)`, and option passing.

### C5 — Observability & resilience
- [x] Pool metrics: `pool.status()``{ maxSize, size, available, inUse, waiting }` (from
      deadpool's `Status`; `inUse = size − available`).
- [x] `statement_timeout`: `statementTimeoutMs` pool option sets it server-side on every
      connection (`options=-c statement_timeout=…`). Per-query cancellation is the C4
      `query(…, { timeoutMs, signal })` path (client-side `cancel_token`).
- [x] Optional logging hook: `logger(event)` pool option, called once per query with
      `{ sql, durationMs, rowCount? , error? }`; thrown logger errors are swallowed.

### C6 — Packaging & release
- [x] `@napi-rs/cli` prebuilds: `win32-x64-msvc`, `darwin` x64/arm64, `linux` x64/arm64
      (`gnu` + `musl`) — 7 targets (`napi.targets`). Linux builds cross-compile with
      `cargo-zigbuild` (`--cross-compile`); win/mac build natively.
- [x] GitHub Actions matrix (`.github/workflows/release.yml`) → `napi prepublish` publishes
      per-platform `hyperiondb-client-<triple>` packages and wires them as the main package's
      `optionalDependencies`. Triggered by `[cd]` in the commit message on `main`.
- [x] npm publish; the workflow bumps `npm version patch` and syncs `Cargo.toml` to match,
      commits the bump back (no `[cd]`, so it doesn't re-trigger), then publishes. (Needs an
      `NPM_TOKEN` secret.)
- [x] README: install, connect, failover behavior, read-scaling, type mapping, testing.

## Testing

`node-addon/test/` (`node:test`). `npm test` runs the type + fence tests against a running
cluster (primary on the first host); `npm run test:chaos` runs the failover test, which needs
to stop a node (`test/cluster.js`, defaults to the local pgrx cluster via WSL `pg_ctl`; set
`HYPERION_CTL`/env for docker). Connection + topology come from `HYPERION_*` env vars.

- [x] Unit: type round-trips (params ↔ rows) for every supported PG type — `test/types.test.js`
      (scalars, `int8`→BigInt, `numeric` precision, NULLs, date/time/void, arrays incl `int8[]`,
      Date/Buffer/array/jsonb params).
- [x] Integration: query load through the primary-following pool, stop the primary, assert
      reconnection to the new primary and **zero acked-write loss**`test/chaos.test.js` (JS
      port of `packages/chaos-writer`; requires synchronous replication for true zero-loss).
- [x] Read-only-fence eviction (C3) explicitly — `test/fence.test.js` (fences the primary via
      `ALTER SYSTEM`, asserts the typed error + recovery; always resets the fence in teardown).

## Open decisions
- [x] Pool crate: `deadpool-postgres` (simpler, recycling built in)
- [x] Package + crate name (brand: HyperionDb).
- [x] `bigint`/`numeric` JS representation (C2).

## Not planned

- [ ] TLS backend (rustls vs native-tls) and default per environment.