pgvpd 1.0.0

Virtual Private Database for PostgreSQL — tenant isolation at the connection level
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
# Pgvpd

[![CI](https://github.com/solidcitizen/pgvpd/actions/workflows/ci.yml/badge.svg)](https://github.com/solidcitizen/pgvpd/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/pgvpd.svg)](https://crates.io/crates/pgvpd)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

**Virtual Private Database for PostgreSQL**

Pgvpd is a TCP proxy that makes tenant identity intrinsic to the database
connection — the way Oracle VPD does — so your ORM stays completely unaware of
multi-tenancy. The database enforces isolation. The application just connects.

```
Your App ──→ Pgvpd Proxy ──→ PostgreSQL (RLS enforced)
              ↕                     ↕
        Extracts tenant        Policies filter
        from username          by tenant context
```

## The Problem

Postgres RLS can enforce tenant isolation, but someone has to call
`SET app.current_tenant_id` on every connection. If they forget, queries
run as a superuser and return **all tenants' data**. Every team hand-rolls
the same middleware, connection-pinning, and AsyncLocalStorage plumbing.
The application becomes the security boundary — and applications make mistakes.

ORMs make this worse. They treat database connections as fungible resources
and assume a single privileged role — exactly the opposite of what RLS
requires. The moment you use Drizzle, Prisma, or TypeORM with a standard
connection pool, every query runs in the same security context. Setting
per-request session variables requires manual connection pinning that the
ORM wasn't designed for, and a single missed code path fails open.

Platforms like Supabase (via PostgREST) and Hasura work around this by
routing queries through a REST or GraphQL gateway that sets RLS context
per request. But this means giving up the ORM entirely — you query
through the gateway's API, not SQL. You lose joins, transactions,
migrations, and the full expressiveness of Postgres. You also inherit the
gateway's limitations: response size caps, restricted query patterns, and
another service in the critical path. The database has a policy engine;
you shouldn't need a middleware layer to use it.

## The Solution

Encode the tenant in the username. Pgvpd handles the rest.

```bash
# Instead of connecting as:
psql -U app_user -d mydb

# Connect as:
psql -h localhost -p 6432 -U app_user.acme -d mydb
```

Pgvpd extracts `acme` as the tenant ID, rewrites the username to
`app_user`, and after authentication injects:

```sql
SET app.current_tenant_id = 'acme';
SET ROLE app_user;
```

Every subsequent query is scoped by RLS. The ORM never knows.

## Quick Start

### 1. Install

**From release binaries** (recommended):

```bash
# Linux x86_64
curl -sSL https://github.com/solidcitizen/pgvpd/releases/latest/download/pgvpd-linux-x86_64.tar.gz | tar xz
sudo mv pgvpd /usr/local/bin/

# macOS Apple Silicon
curl -sSL https://github.com/solidcitizen/pgvpd/releases/latest/download/pgvpd-macos-aarch64.tar.gz | tar xz
sudo mv pgvpd /usr/local/bin/
```

**From crates.io**:

```bash
cargo install pgvpd
```

**From source**:

```bash
git clone https://github.com/solidcitizen/pgvpd.git
cd pgvpd
cargo build --release
```

The binary is at `target/release/pgvpd`.

**Docker**:

```bash
docker run -p 6432:6432 ghcr.io/solidcitizen/pgvpd --upstream-host host.docker.internal
```

### 2. Set up Postgres

```bash
psql -U postgres -d your_database -f sql/setup.sql
```

This creates:
- `app_user` role with `NOSUPERUSER NOBYPASSRLS`
- `current_tenant_id()` function (fail-closed: returns NULL if unset)
- `pgvpd_protect()` helper to enable RLS on your tables
- `pgvpd_status()` to verify protection

### 3. Protect your tables

```sql
SELECT pgvpd_protect('contacts', 'tenant_id');
SELECT pgvpd_protect('invoices', 'org_id');
SELECT pgvpd_protect('orders', 'tenant_id');

-- Verify
SELECT * FROM pgvpd_status();
```

### 4. Start Pgvpd

```bash
# Point at your Postgres instance
./target/release/pgvpd --upstream-port 5432

# Or with a config file
cp pgvpd.conf.example pgvpd.conf
./target/release/pgvpd --config pgvpd.conf
```

### 5. Connect through Pgvpd

```bash
# From psql
psql -h localhost -p 6432 -U app_user.acme mydb

# From your application — just change the connection string
DATABASE_URL=postgresql://app_user.acme:password@localhost:6432/mydb
```

Your ORM (Drizzle, Prisma, TypeORM, etc.) connects to Pgvpd instead of
Postgres directly. No code changes. No middleware. No connection pinning.

## How It Works

**Passthrough mode** (default):

1. **Client connects** with username `app_user.acme`
2. **Pgvpd parses** the tenant ID (`acme`) from the username
3. **Username is rewritten** to `app_user` for upstream Postgres
4. **Auth is proxied** transparently (supports cleartext, MD5, SCRAM-SHA-256)
5. **Context resolvers** run (if configured) — SQL queries that derive additional session variables from database state
6. **After auth**, Pgvpd injects `SET app.current_tenant_id = 'acme'` (plus any resolver-derived variables)
7. **All traffic** is then piped transparently — zero overhead (`tokio::io::copy_bidirectional`)
8. **RLS policies** on every tenant-scoped table enforce isolation

**Pool mode** (`pool_mode = session`):

Steps 1–2 are the same, but Pgvpd authenticates the client itself (cleartext), checks out a pooled upstream connection, resets it (`DISCARD ALL`), resolves + injects context, and uses a message-aware pipe (`pipe_pooled`) that intercepts Terminate messages so the upstream connection can be returned to the pool. On disconnect, the connection is cleaned up with `ROLLBACK` → `DISCARD ALL` and returned to the idle pool.

### Fail-Closed Guarantee

- `app_user` has `NOSUPERUSER NOBYPASSRLS` — cannot bypass RLS
- `FORCE ROW LEVEL SECURITY` is on — even table owners are filtered
- `current_tenant_id()` returns `NULL` if the variable isn't set
- RLS policies match `tenant_id = current_tenant_id()` — NULL matches nothing
- **No context = no data. Never fail-open.**

### Superuser Bypass

Admin connections (e.g., `postgres`) are passed through without tenant
extraction. Configure bypass usernames in `pgvpd.conf`:

```
superuser_bypass = postgres
```

## Configuration

| Option | Default | Env Var | Description |
|--------|---------|---------|-------------|
| `port` | 6432 | `PGVPD_PORT` | Listen port |
| `listen_host` | 127.0.0.1 | `PGVPD_HOST` | Bind address |
| `upstream_host` | 127.0.0.1 | `PGVPD_UPSTREAM_HOST` | Postgres host |
| `upstream_port` | 5432 | `PGVPD_UPSTREAM_PORT` | Postgres port |
| `tenant_separator` | `.` | `PGVPD_TENANT_SEPARATOR` | Separator in username |
| `context_variables` | `app.current_tenant_id` | `PGVPD_CONTEXT_VARIABLES` | Comma-separated session variables |
| `value_separator` | `:` | `PGVPD_VALUE_SEPARATOR` | Separator for multiple values |
| `superuser_bypass` | `postgres` | `PGVPD_SUPERUSER_BYPASS` | Bypass usernames (comma-separated) |
| `log_level` | `info` | `PGVPD_LOG_LEVEL` | debug/info/warn/error |
| `handshake_timeout` | 30 | `PGVPD_HANDSHAKE_TIMEOUT` | Max seconds for startup + auth + injection |
| `tls_port` | *(disabled)* | `PGVPD_TLS_PORT` | TLS listen port (requires tls_cert + tls_key) |
| `tls_cert` || `PGVPD_TLS_CERT` | Path to PEM certificate for TLS termination |
| `tls_key` || `PGVPD_TLS_KEY` | Path to PEM private key for TLS termination |
| `upstream_tls` | false | `PGVPD_UPSTREAM_TLS` | Connect to upstream Postgres over TLS |
| `upstream_tls_verify` | true | `PGVPD_UPSTREAM_TLS_VERIFY` | Verify upstream TLS certificate |
| `upstream_tls_ca` || `PGVPD_UPSTREAM_TLS_CA` | Custom CA cert for upstream TLS |
| `pool_mode` | none | `PGVPD_POOL_MODE` | `none` (passthrough) or `session` (pooling) |
| `pool_size` | 20 | `PGVPD_POOL_SIZE` | Max upstream connections per (database, role) |
| `pool_password` || `PGVPD_POOL_PASSWORD` | Password clients must provide in pool mode |
| `upstream_password` || `PGVPD_UPSTREAM_PASSWORD` | Password pgvpd uses to authenticate upstream |
| `pool_idle_timeout` | 300 | `PGVPD_POOL_IDLE_TIMEOUT` | Seconds idle before pooled connection is closed |
| `pool_checkout_timeout` | 5 | `PGVPD_POOL_CHECKOUT_TIMEOUT` | Seconds to wait when pool is full |
| `resolvers` || `PGVPD_RESOLVERS` | Path to context resolver TOML file |
| `set_role` | *(login user)* | `PGVPD_SET_ROLE` | Override SET ROLE target (e.g., `authenticated`) |
| `tenant_allow` | *(none)* | `PGVPD_TENANT_ALLOW` | Comma-separated allow list (only these tenants) |
| `tenant_deny` | *(none)* | `PGVPD_TENANT_DENY` | Comma-separated deny list (block these tenants) |
| `tenant_max_connections` | *(none)* | `PGVPD_TENANT_MAX_CONNECTIONS` | Max concurrent connections per tenant |
| `tenant_rate_limit` | *(none)* | `PGVPD_TENANT_RATE_LIMIT` | Max new connections per tenant per second |
| `tenant_query_timeout` | *(none)* | `PGVPD_TENANT_QUERY_TIMEOUT` | Seconds of inactivity before connection terminated |
| `admin_port` | *(disabled)* | `PGVPD_ADMIN_PORT` | HTTP port for admin API (health, metrics, status) |

Configuration is loaded in priority order: defaults → config file → environment variables → CLI flags.

### Multiple Context Variables

For apps that need more than one dimension of identity (e.g., tenant + user):

```
# pgvpd.conf
context_variables = app.current_list_id,app.current_user_id
value_separator = :
```

Username: `app_user.list123:user456` — Pgvpd injects both:

```sql
SET app.current_list_id = 'list123';
SET app.current_user_id = 'user456';
SET ROLE app_user;
```

## Connection Pooling

With `pool_mode = session`, Pgvpd maintains a pool of upstream Postgres
connections keyed by `(database, role)`. Clients authenticate to Pgvpd
directly (cleartext), and Pgvpd checks out a pooled upstream connection,
resets it, injects context, and pipes traffic using a message-aware
forwarder that intercepts Terminate messages to preserve the upstream
connection.

```ini
# pgvpd.conf
pool_mode = session
pool_size = 20
pool_password = secret
upstream_password = upstream_secret
pool_idle_timeout = 300
pool_checkout_timeout = 5
```

On disconnect, connections are cleaned up (`ROLLBACK` → `DISCARD ALL`) and
returned to the idle pool. An idle reaper closes connections that have been
unused longer than `pool_idle_timeout`. Superuser bypass connections are
never pooled.

## Context Resolvers

Resolvers are SQL queries that run after authentication to derive additional
session variables from database state — the Postgres equivalent of Oracle's
Real Application Security. The application provides identity (a user UUID);
the database resolves the full security context (org membership, team grants,
ACLs).

```toml
# resolvers.toml
[[resolver]]
name = "org_membership"
query = "SELECT org_id, role FROM org_members WHERE user_id = $1"
params = ["app.current_user_id"]
inject = { "app.org_id" = "org_id", "app.org_role" = "role" }
cache_ttl = 300
```

Configure with `resolvers = resolvers.toml` in your config file. Resolvers
execute in dependency order, chain results via bind parameters, and cache
results with configurable TTL. Failed required resolvers terminate the
connection (fail-closed).

## TLS

**TLS termination** (client → Pgvpd): clients connect over TLS to
`tls_port`. Requires `tls_cert` and `tls_key` pointing to PEM files. The
plain listener continues on `port`.

**TLS origination** (Pgvpd → upstream): set `upstream_tls = true` to connect
to Postgres over TLS. Use `upstream_tls_verify = false` for self-signed
certs, or `upstream_tls_ca` for a custom CA.

## SQL Helpers

Pgvpd ships convenience SQL functions (`sql/helpers.sql`) that make RLS
policies readable and composable. Install them alongside `sql/setup.sql`:

```bash
psql -U postgres -d your_database -f sql/helpers.sql
```

### Context access

Replace verbose `current_setting()` + `NULLIF()` boilerplate:

```sql
-- Before
NULLIF(current_setting('app.user_id', true), '')
-- After
pgvpd_context('app.user_id')
```

| Function | Returns | Description |
|----------|---------|-------------|
| `pgvpd_context(var)` | `TEXT` | Read session variable (NULL if unset — fail-closed) |
| `pgvpd_context_array(var)` | `TEXT[]` | Parse comma-separated variable into text array |
| `pgvpd_context_uuid_array(var)` | `UUID[]` | Parse comma-separated variable into UUID array |
| `pgvpd_context_contains(var, uuid)` | `BOOLEAN` | Check if UUID is in a comma-separated variable |
| `pgvpd_context_text_contains(var, text)` | `BOOLEAN` | Check if text value is in a comma-separated variable |

### Multi-path RLS policies

`pgvpd_protect_acl()` builds complex RBAC policies from a JSON config:

```sql
SELECT pgvpd_protect_acl('cases', '[
  {"column": "creator_id", "var": "app.user_id", "type": "uuid"},
  {"column": "id",         "var": "app.granted_case_ids", "type": "uuid_array"},
  {"column": "org_id",     "var": "app.org_id", "type": "uuid",
   "when": "pgvpd_context(''app.org_role'') = ''admin''"}
]');
```

This generates a single policy where a user sees a row if they own it, have
a direct grant, or are an org admin — all enforced by the database.

Supported path types: `text`, `uuid`, `uuid_array`, `text_array`.

## Tests

**Unit tests**:

```bash
cargo test
```

**Integration tests** (end-to-end against a real Postgres via Docker):

```bash
./tests/run.sh
```

This starts a Postgres container, loads fixtures, builds pgvpd, and runs
6 test suites: passthrough, pool, resolvers, admin API, tenant isolation,
and SQL helpers (33 integration tests). Requires Docker and `psql`.

**Benchmarks** (requires running Postgres + pgvpd):

```bash
cargo bench --bench throughput
```

## With Drizzle ORM

```typescript
import { drizzle } from "drizzle-orm/node-postgres";
import { Pool } from "pg";

// Point at Pgvpd, encode tenant in username
const pool = new Pool({
  host: "localhost",
  port: 6432,
  user: `app_user.${tenantId}`,   // <-- tenant goes here
  password: "app_user_password",   // password for app_user role
  database: "mydb",
});

const db = drizzle(pool);

// Every query is now tenant-scoped. No middleware. No context. No pinning.
const contacts = await db.select().from(contactMaster);
```

See **[docs/drizzle-integration.md](docs/drizzle-integration.md)** for the
full guide: Express integration, pool-per-tenant patterns, transactions,
migration from connection pinning, and troubleshooting.

## Documentation

- **[docs/architecture.md]docs/architecture.md** — How Pgvpd works:
  connection lifecycle, state machine, security model, wire protocol, and
  comparison with alternatives.
- **[docs/drizzle-integration.md]docs/drizzle-integration.md** — Step-by-step
  guide for Drizzle ORM: Express middleware, pool strategies, multi-context
  variables, migration from connection pinning.
- **[PLAN.md]PLAN.md** — Design rationale, roadmap, and the problem
  Pgvpd solves.

## Architecture

Pgvpd is written in **Rust** using [tokio](https://tokio.rs/) for async
I/O. It implements the minimum subset of the Postgres v3 wire protocol needed
for:

- Parsing `StartupMessage` to extract the tenant-encoded username
- Rewriting the username before forwarding to upstream Postgres
- Relaying SCRAM-SHA-256 (and other) authentication exchanges
- Executing context resolvers (SQL queries to derive session state)
- Injecting `SET` commands after authentication
- Bidirectional piping for all subsequent traffic

In passthrough mode, after the initial handshake (~3 messages), Pgvpd
adds **zero overhead** — it's a direct TCP pipe via
`tokio::io::copy_bidirectional`. In pool mode, the pipe (`pipe_pooled`) does
message-aware forwarding to intercept Terminate messages and preserve
upstream connections for reuse.

Components: `protocol.rs` (wire protocol), `connection.rs` (state machine),
`auth.rs` (authentication), `pool.rs` (connection pool), `resolver.rs`
(context resolvers), `stream.rs` (plain/TLS abstraction), `tls.rs` (TLS
config), `admin.rs` (HTTP admin API), `metrics.rs` (observability counters).

Single static binary. No runtime dependencies.

See **[docs/architecture.md](docs/architecture.md)** for the full
architecture deep-dive.

## License

MIT