Expand description
Cascading replication for async read-replicas (issue #838, PRD #819).
§Why cascade
A single primary streaming WAL to every replica pays an O(replicas) fan-out cost: each connected replica is one more stream the primary must frame, retain WAL for, and track. Read scale-out — adding many async read-replicas — therefore loads the primary, the one node whose spare capacity matters most because it also serves the write path.
Cascading replication bounds that fan-out: an async read-replica may stream from an intermediate replica instead of from the primary. The intermediate holds the sub-replica’s slot and forwards the WAL stream it is already receiving. The primary sees one stream (to the intermediate) regardless of how many sub-replicas hang off it.
§Why voting members never cascade
ADR 0030 keeps the durability/election path simple and fast: a quorum is a
majority of voting members, and a synchronous write is acknowledged only
once a quorum has it durably. If a voting member streamed through an
intermediate, every commit-ack and every election-relevant frontier would
pay an extra hop of lag, and an intermediate failure would stall a member
the consensus path depends on. So the rule is categorical: a voting
member always streams directly from the primary. Cascade is a
read-scale-out optimisation for members that are not in the durability
path. A voting member that is handed a cascade source refuses it and falls
back to the primary (see plan_upstream).
§Frontier propagation
Correctness of the chain rests on one invariant: the primary must not
prune WAL that any node downstream of the chain still needs. The
intermediate enforces this by reporting to its own upstream a retention
frontier that is the minimum of (a) what it has itself applied and (b)
what every sub-replica streaming through it has confirmed
(CascadeRelay::upstream_confirmed_lsn). A slow leaf therefore holds the
whole chain’s slot open at the primary, exactly as if it were connected
directly — this is the cascaded analogue of PostgreSQL’s
hot_standby_feedback.
The read-visibility frontier flows the same direction: a causal
(CausalBookmark) read can only be satisfied at a node that has applied
up to the bookmark’s commit_lsn. Down the chain the applied frontier is
monotonically non-increasing (a sub-replica can never be ahead of the
intermediate that feeds it), so
CascadeRelay::downstream_visible_frontier reports the highest LSN a
given sub-replica can serve.
§Module shape
This module is pure policy + bookkeeping with no I/O: plan_upstream
decides where a node connects, and CascadeRelay tracks the slots and
frontiers an intermediate holds for its sub-replicas. The transport that
actually forwards bytes composes these primitives, so the rules are
unit-testable without a network — the same discipline the election core
(issue #834) follows.
Structs§
- Cascade
Relay - Tracks the sub-replica slots an intermediate holds and the frontiers that must propagate through the chain. Pure bookkeeping — the forwarding transport calls into it to decide what to send and what to advertise upstream.
- Cascade
Upstream - An intermediate replica a sub-replica may cascade from.
- Downstream
Slot - A sub-replica slot held by an intermediate.
Enums§
- Cascade
Refusal - Why a requested cascade source was refused and the node fell back to the primary. Surfaced (not swallowed) so a misconfiguration is observable rather than a silent performance cliff.
- Replica
Class - How a node chooses its WAL upstream.
- Upstream
Choice - Where a node should open its WAL stream.
Functions§
- plan_
upstream - Decide where a node streams from, given its streaming class and an optionally-requested intermediate source.