pg_replica 0.2.0

Consensus-driven failover for PostgreSQL (Raft control plane)
docs.rs failed to build pg_replica-0.2.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

pg_replica

A PostgreSQL extension that gives a small cluster of vanilla Postgres nodes automatic, consensus-driven failover — full-cluster replication (tables, roles, DDL, everything) with a built-in Raft group and no external dependencies: no etcd, no Consul, no Kubernetes.

Replaces Patroni, CloudNativePG, pgActive and many more.

One job: keep a single leader elected and the data byte-identical across N nodes, and fail over automatically when the leader dies. Do that one job well.


The one idea

Don't reinvent replication. Postgres already ships physical (WAL) streaming replication, which copies everything byte-for-byte — heap, indexes, and the shared catalog pg_authid (i.e. roles/users and their SCRAM verifiers). That is exactly why physical replicas have the same roles as the primary, while logical replication / active-active (pgactive) do not.

pg_replica adds the only piece Postgres itself lacks: automatic leader election and failover, using an embedded Raft quorum instead of an external DCS.

Plane Who does it
Data (tables, indexes, roles, DDL) Postgres streaming replication — untouched, battle-tested
Leadership & failover Embedded Raft, running inside a Postgres background worker
Process lifecycle (start/restart Postgres) Your existing supervisor — systemd or Docker restart policy

Result: roles, DDL, and data stay consistent on every node, and a dead primary is replaced in seconds — with no human, no etcd, no Kubernetes.


Performance (highly subjective numbers for 0.1.0)

pgbench, 8 clients / 4 jobs, single-row INSERT on the primary, 10 s, async mode

Memory

  • 8 MB RSS
  • 2 MB PSS
  • 0.8 MB private

CPU

  • 0.38% idle

Throughput

  • 2494 tps
  • 3.2 ms avg latency
  • 17.6k rows

Failover latency

  • 5 s to writable new primary

Test

# Build dependencies

sudo apt-get update
sudo apt-get install -y \
  build-essential pkg-config libssl-dev \
  libreadline-dev zlib1g-dev flex bison \
  libxml2-dev libxslt-dev libxml2-utils xsltproc \
  ccache libclang-dev clang \
  iptables libfaketime
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup default stable
rustc --version

# test dependencies
cargo install --locked cargo-pgrx --version 0.18.1
cargo pgrx init
cd packages/pg_replica
cargo pgrx run pg18

CREATE EXTENSION pg_replica;

Run the whole test suite

One command builds the extension + test clients and runs every test (3-node clusters in /tmp, torn down between each):

bash scripts/run-all-tests.sh            # build, then run all tests + summary
bash scripts/run-all-tests.sh --no-build # skip the rebuild

Coverage (scripts/test-*.sh, each spins a real 3-node cluster):

Test Proves
test-model formal model check (stateright, exhaustive, N=3 and N=5): (1) durability — the sync-quorum math (ack = majority) loses no acked transaction across every reachable crash/failover interleaving; (2) split-brain — a resurrected stale-term primary can never commit a conflicting write (a commit needs a majority still at the leader's term). Both have negative controls that produce counterexamples
test-m3-lsn failover promotes the highest-LSN survivor (no data loss)
test-m4-fence minority primary self-demotes read-only (no split-brain)
test-m4-watchdog a hung control plane is fenced by the deadman watchdog
test-m4-partition a network-partitioned (but running) primary self-demotes
test-m5-rejoin deposed primary pg_rewind-rejoins as a standby
test-m5-walgone WAL gone → automatic pg_basebackup re-clone
test-compaction Raft log stays bounded via snapshotting
test-m6-routing a multi-host client follows the failover with only a reconnect
test-m7-sync quorum-sync = zero committed-transaction loss on failover
test-quorum-consistency the Postgres sync quorum (synchronous_standby_names) and the Raft consensus quorum name the same nodes, and a write confirmed by one standby is promoted onto that standby — no two-quorum drift on failover
test-perf supervisor memory (single-digit MB private overhead), idle CPU, write throughput (pgbench), and failover latency
test-chaos Jepsen-style: continuous writer under partitions / SIGSTOP / kill / clock-skew / slow-disk / rolling-restart — 0 split-brain, converges, zero-loss for clean failovers

Goals / non-goals

Goals

  • Automatic failover on a 3- or 5-node cluster (Raft majority quorum).
  • Full-cluster fidelity: roles, GRANTs, DDL, extensions, data — all replicated (free, via physical replication).
  • Zero external coordination services (Raft is embedded).
  • Lightweight: one .so extension + Postgres; no JVM, no Go control plane, no k8s.
  • Safe by default: quorum-gated, fences the old primary, picks the most-advanced replica, rejoins the loser with pg_rewind.
  • Operable from SQL: SELECT pg_replica.status();, pg_replica.failover(), etc.

Non-goals

  • Sharding / horizontal write scale-out → that is Citus, a different axis.
  • Connection pooling / proxy → recommend HAProxy or libpq multi-host (target_session_attrs=read-write). pg_replica only publishes who the primary is.
  • Backups / PITR → recommend pgBackRest or wal-g.
  • Logical / multi-master replication → out of scope by design.

Positioning (honest prior art)

Tool Consensus External deps Form factor
CloudNativePG k8s control plane Kubernetes required operator
Patroni etcd/Consul/k8s — or pysyncobj Raft mode DCS (or its raft lib) Python agent
pg_auto_failover single monitor node (not quorum) a monitor Postgres C ext + agent
Stolon external store (etcd/consul) DCS Go agents
repmgr none (manual/assisted) C ext + CLI
pg_replica (this) embedded Raft quorum none Postgres extension

The niche: embedded quorum (unlike pg_auto_failover's single monitor), no external DCS (unlike Patroni-etcd / Stolon), no Kubernetes (unlike CNPG), shipped as a plain Postgres extension. Closest existing thing is Patroni's raft DCS mode — pg_replica aims to be that idea, but native, in-process, and Rust-light.