Skip to main content

Module cascade

Module cascade 

Source
Expand description

Cascading replication for async read-replicas (issue #838, PRD #819).

§Why cascade

A single primary streaming WAL to every replica pays an O(replicas) fan-out cost: each connected replica is one more stream the primary must frame, retain WAL for, and track. Read scale-out — adding many async read-replicas — therefore loads the primary, the one node whose spare capacity matters most because it also serves the write path.

Cascading replication bounds that fan-out: an async read-replica may stream from an intermediate replica instead of from the primary. The intermediate holds the sub-replica’s slot and forwards the WAL stream it is already receiving. The primary sees one stream (to the intermediate) regardless of how many sub-replicas hang off it.

§Why voting members never cascade

ADR 0030 keeps the durability/election path simple and fast: a quorum is a majority of voting members, and a synchronous write is acknowledged only once a quorum has it durably. If a voting member streamed through an intermediate, every commit-ack and every election-relevant frontier would pay an extra hop of lag, and an intermediate failure would stall a member the consensus path depends on. So the rule is categorical: a voting member always streams directly from the primary. Cascade is a read-scale-out optimisation for members that are not in the durability path. A voting member that is handed a cascade source refuses it and falls back to the primary (see plan_upstream).

§Frontier propagation

Correctness of the chain rests on one invariant: the primary must not prune WAL that any node downstream of the chain still needs. The intermediate enforces this by reporting to its own upstream a retention frontier that is the minimum of (a) what it has itself applied and (b) what every sub-replica streaming through it has confirmed (CascadeRelay::upstream_confirmed_lsn). A slow leaf therefore holds the whole chain’s slot open at the primary, exactly as if it were connected directly — this is the cascaded analogue of PostgreSQL’s hot_standby_feedback.

The read-visibility frontier flows the same direction: a causal (CausalBookmark) read can only be satisfied at a node that has applied up to the bookmark’s commit_lsn. Down the chain the applied frontier is monotonically non-increasing (a sub-replica can never be ahead of the intermediate that feeds it), so CascadeRelay::downstream_visible_frontier reports the highest LSN a given sub-replica can serve.

§Module shape

This module is pure policy + bookkeeping with no I/O: plan_upstream decides where a node connects, and CascadeRelay tracks the slots and frontiers an intermediate holds for its sub-replicas. The transport that actually forwards bytes composes these primitives, so the rules are unit-testable without a network — the same discipline the election core (issue #834) follows.

Structs§

CascadeRelay
Tracks the sub-replica slots an intermediate holds and the frontiers that must propagate through the chain. Pure bookkeeping — the forwarding transport calls into it to decide what to send and what to advertise upstream.
CascadeUpstream
An intermediate replica a sub-replica may cascade from.
DownstreamSlot
A sub-replica slot held by an intermediate.

Enums§

CascadeRefusal
Why a requested cascade source was refused and the node fell back to the primary. Surfaced (not swallowed) so a misconfiguration is observable rather than a silent performance cliff.
ReplicaClass
How a node chooses its WAL upstream.
UpstreamChoice
Where a node should open its WAL stream.

Functions§

plan_upstream
Decide where a node streams from, given its streaming class and an optionally-requested intermediate source.