# Node.js cluster client (napi-rs) — build plan
A native Node.js addon, written in Rust with **napi-rs**, that gives JS/TS an efficient
connection **pool over the N-node pg_replica cluster** and follows the primary
automatically. The host-racing, primary detection, pooling, and failover recovery live in
Rust — JS never loops over connections.
## Why this shape
- **Not WASM** — WASM has no raw TCP; the PG wire protocol needs a socket. A native addon
loads into Node and uses real sockets, so `tokio-postgres` works unmodified.
- **Not JS probe-loop** — pure-JS `pg` is single-host ([#1470](https://github.com/brianc/node-postgres/issues/1470)),
and the libpq path can't be reached cleanly through `pg`'s parser. We push that logic
down into Rust instead of hand-rolling it in JS (see [CLIENT.md](CLIENT.md)).
- **Reuse what's verified** — `tokio-postgres` already does multi-host +
`target_session_attrs=read-write` (proven in `packages/failover-probe`,
`scripts/test-m6-routing.sh`). The addon reuses that exact `Config`.
## Core idea (why this is small)
A `tokio-postgres` `Config` with the host list + `target_session_attrs=read-write` lands
each connection on the current primary by itself. Wrap that `Config` in an async pool
(`bb8-postgres`) and you get an N-node, primary-following pool with
almost no custom routing code: every pooled connection resolves the primary at connect
time, and broken connections (after a failover) are replaced by fresh ones that re-resolve
to the new primary. The pool crate also spawns each `Connection` future for us.
The one non-trivial case to handle explicitly: a primary that is **fenced read-only
without dropping its sessions** (pg_replica sets `default_transaction_read_only=on`).
An already-open pooled connection to that node stays connected but can no longer write.
So the pool needs a **checkout validation** that confirms the connection is still on a
writable primary (`SHOW transaction_read_only` → `off`) and evicts it otherwise.
## Public API
```ts
import { createPool } from 'hyperiondb-client'
const pool = createPool({
hosts: ['10.0.0.1', '10.0.0.2', '10.0.0.3'],
port: 5432,
user: 'app',
password: '…',
database: 'weido',
mode: 'read-write',
poolSize: 10,
connectTimeoutMs: 2000,
})
const rows = await pool.query<{ id: number }>('select id from t where x = $1', [42])
await pool.transaction(async (tx) => {
await tx.query('insert into t (x) values ($1)', [1])
await tx.query('update t set x = x + 1 where x = $1', [1])
})
await pool.end()
```
A separate `mode: 'read-only'` (or `'prefer-standby'` on PG 14+) pool routes read-heavy
work (ParadeDB search) to standbys.
## Milestones
### C1 — Core pool
- [x] Reuse the `failover-probe` `Config` (multi-host, `target_session_attrs=read-write`,
`connect_timeout`).
- [x] `createPool(opts)` → `#[napi]` class holding a `deadpool`/`bb8` pool.
- [x] `pool.query(sql, params)` async (`#[napi]` → JS Promise) returning rows as JS objects
keyed by column name.
- [x] `pool.end()` graceful drain.
### C2 — Type marshalling
- [x] Results: PG `Row` → JS. Mappings: `bool`→boolean, `int2/4`→number, `int8`→**BigInt**,
`oid`→number, `float4/8`→number, `numeric`→**string**, `text`/`varchar`/`bpchar`/`name`→
string, `uuid`→string, `bytea`→**Buffer**, `json`/`jsonb`→parsed value,
`timestamptz`/`timestamp`/`date`/`time`→**ISO 8601 string**, and arrays of all the above
→ JS arrays. Custom binary `numeric`→string decoder (arbitrary precision, no float).
- [x] Params: JS values → `tokio_postgres::types::ToSql` (null, boolean, number, **BigInt**,
string, **Buffer**→bytea, **Date**→timestamptz, array→pg array when the column is an
array type else jsonb, object→jsonb). Integer/float targets coerced by column type.
- [x] **Decision:** `int8`/`bigint` → **BigInt** (lossless 64-bit); `numeric` → **string**
(arbitrary precision). `timestamptz` → ISO 8601 string; `bytea` → Buffer.
### C3 — Primary affinity & failover
- [x] Checkout validation: a `deadpool` `pre_recycle` hook runs `SHOW transaction_read_only`
and evicts the connection when it is `on` (read-only-fence-without-disconnect window).
Write pool only; fresh connections are already validated by `target_session_attrs`.
- [x] Recycle on connection error; new connections re-resolve the primary (the hook doubles
as a liveness check — a failed `SHOW` evicts the dead connection).
- [x] Retry/backoff (50ms→500ms, capped) bounded by `acquireTimeoutMs` (default 5000),
surfacing `no writable primary available after <ms>ms` as a typed error.
- [x] `mode: 'read-write' | 'read-only' | 'prefer-standby' | 'any'`. `read-only` →
`target_session_attrs=read-only` (lands on standbys); `prefer-standby`/`any` →
`target_session_attrs=any` + random host load-balancing. (tokio-postgres 0.7.x has no
server-side standby *preference*, so `prefer-standby` spreads across all reachable nodes.)
### C4 — Ergonomics
- [x] Transactions: native `pool.begin() -> Transaction {query, commit, rollback}` (one
dedicated connection held in an `Arc<Mutex<Option<Client>>>`), plus a `pool.transaction(cb)`
helper (auto `BEGIN`/`COMMIT`, `ROLLBACK` on throw) in the JS wrapper.
- [x] Prepared statements: query paths use deadpool `prepare_cached`, so repeated SQL reuses
a server-side named statement per connection. (Pipelining is inherent to tokio-postgres
for concurrent queries on a connection.)
- [x] Query cancellation: `query(sql, params, { timeoutMs, signal })`. Both a timeout and an
`AbortSignal` trip the connection's `tokio_postgres` `cancel_token` (server-side cancel).
The `!Send` `Rc`-based `AbortSignal` is bridged on the JS thread (`on_abort` → a `Send`
`Notify`) so the async query can `select!` on it.
- [x] Error mapping: PG errors carry the 5-char `SQLSTATE` on JS `err.code` (native formats
`[SQLSTATE xxxxx] msg`; the JS wrapper parses it onto `.code` and cleans the message).
- [x] Hand-checked `client.d.ts` (precise `PoolOptions`/`Param`/`Row`/`QueryOptions`/`Pool`/
`Transaction` types) is the published `types`; the napi-generated `index.d.ts` stays as
the internal native binding. Architecture is now native core (`index.js`/`.node`) + a thin
JS ergonomic layer (`client.js`) that adds `.code`, `transaction(cb)`, and option passing.
### C5 — Observability & resilience
- [x] Pool metrics: `pool.status()` → `{ maxSize, size, available, inUse, waiting }` (from
deadpool's `Status`; `inUse = size − available`).
- [x] `statement_timeout`: `statementTimeoutMs` pool option sets it server-side on every
connection (`options=-c statement_timeout=…`). Per-query cancellation is the C4
`query(…, { timeoutMs, signal })` path (client-side `cancel_token`).
- [x] Optional logging hook: `logger(event)` pool option, called once per query with
`{ sql, durationMs, rowCount? , error? }`; thrown logger errors are swallowed.
### C6 — Packaging & release
- [x] `@napi-rs/cli` prebuilds: `win32-x64-msvc`, `darwin` x64/arm64, `linux` x64/arm64
(`gnu` + `musl`) — 7 targets (`napi.targets`). Linux builds cross-compile with
`cargo-zigbuild` (`--cross-compile`); win/mac build natively.
- [x] GitHub Actions matrix (`.github/workflows/release.yml`) → `napi prepublish` publishes
per-platform `hyperiondb-client-<triple>` packages and wires them as the main package's
`optionalDependencies`. Triggered by `[cd]` in the commit message on `main`.
- [x] npm publish; the workflow bumps `npm version patch` and syncs `Cargo.toml` to match,
commits the bump back (no `[cd]`, so it doesn't re-trigger), then publishes. (Needs an
`NPM_TOKEN` secret.)
- [x] README: install, connect, failover behavior, read-scaling, type mapping, testing.
## Testing
`node-addon/test/` (`node:test`). `npm test` runs the type + fence tests against a running
cluster (primary on the first host); `npm run test:chaos` runs the failover test, which needs
to stop a node (`test/cluster.js`, defaults to the local pgrx cluster via WSL `pg_ctl`; set
`HYPERION_CTL`/env for docker). Connection + topology come from `HYPERION_*` env vars.
- [x] Unit: type round-trips (params ↔ rows) for every supported PG type — `test/types.test.js`
(scalars, `int8`→BigInt, `numeric` precision, NULLs, date/time/void, arrays incl `int8[]`,
Date/Buffer/array/jsonb params).
- [x] Integration: query load through the primary-following pool, stop the primary, assert
reconnection to the new primary and **zero acked-write loss** — `test/chaos.test.js` (JS
port of `packages/chaos-writer`; requires synchronous replication for true zero-loss).
- [x] Read-only-fence eviction (C3) explicitly — `test/fence.test.js` (fences the primary via
`ALTER SYSTEM`, asserts the typed error + recovery; always resets the fence in teardown).
## Open decisions
- [x] Pool crate: `deadpool-postgres` (simpler, recycling built in)
- [x] Package + crate name (brand: HyperionDb).
- [x] `bigint`/`numeric` JS representation (C2).
## Not planned
- [ ] TLS backend (rustls vs native-tls) and default per environment.