Expand description
Term-based, quorum-gated automatic election (issue #834, PRD #819, ADR 0030).
This is the consensus core that turns a primary loss into an automatic, safe promotion. It lives in the first-party-but-decoupled control-plane supervisor (ADR 0030) — distinct from the data path — and reuses the two pieces the rest of replication already built:
- the commit watermark (
super::commit_waiter/super::quorum) — the highest LSN durably replicated to a quorum that intersects every possible election majority. Nothing at or below it may ever be rolled back; and - the FAILOVER handover machinery (
super::failover) — once a candidate wins, promotion is driven through the same coordinated role-swap, not a parallel state machine.
§The five hard requirements (ADR 0030, issue #834)
- Dry-run probe. A candidate first asks “would you vote for me?” without bumping any term. Only a real election bumps the term. This keeps a flapping candidate from burning through terms and lets the supervisor probe liveness cheaply.
- Durable last-vote. A voter persists
(term, voted_for)before acknowledging a grant, so a voter that crashes and restarts mid-term never double-votes — the second request in the same term for a different candidate is refused from disk. - Watermark vote rule (the safety core). A voter MUST refuse any candidate whose log does not cover the commit watermark. An acknowledged synchronous write sits at or below the watermark, so a winner necessarily carries it — the write provably survives the failover. This is the one rule that may not be relaxed.
- Randomized election timeouts. Candidates wait a randomized interval before standing, so split votes are rare and self-correcting.
- Membership rules. A quorum is a majority of voting members.
Witness members (#836) hold no data but vote, so
2 data + 1 witnessis a valid HA shape. A catching-up replica is non-voting until it reaches a healthy state — it neither votes nor stands.
§No two primaries in a term
This invariant is structural, not probabilistic:
- a win requires a strict majority of voting members, and two strict majorities of the same set always intersect; and
- the shared voter in any two majorities votes at most once per term (durable last-vote), so it cannot grant two different candidates the same term.
Therefore at most one candidate can collect a majority in a given term, even under an arbitrary network partition. The partition tests exercise exactly this.
§Module shape
Like super::failover, the candidate-side ElectionCoordinator::run
is a pure state machine: the clock, the per-peer vote RPC, the
durable term bump, and the promotion are injected behind
ElectionTransport, so the whole election is exercised
deterministically with a scripted fake — no clock, no network, no engine.
The voter-side Voter wraps a LastVoteStore (durable on disk in
production, in-memory in tests) and applies the vote rule.
Structs§
- Election
Coordinator - The quorum-gated election state machine.
- Election
Request - A request to run an election on behalf of
candidate. - File
Last Vote Store - Last
Vote - A node’s durable voting record: the highest term it has participated in and who, if anyone, it granted that term. Persisted so a restart cannot erase the fact that a vote was already cast (requirement 2).
- Member
- A cluster member as seen by the supervisor’s membership view.
- Memory
Last Vote Store - In-memory last-vote store for tests and witnesses that do not need cross-restart durability. (A witness should still persist in production; the file store is used there.)
- Vote
Request - A request for a vote, sent by a candidate to a voter.
- Voter
- A voting member. Wraps the durable
LastVoteStoreand applies the vote rule. The voter is the seat of correctness: the watermark rule and the durable double-vote guard both live here.
Enums§
- Election
Outcome - The result of an election attempt.
- Last
Vote Error - Member
Kind - Whether a member holds data (and can therefore be promoted to primary) or is a vote-only witness (ADR 0030 — “a node that runs only the supervisor module”).
- Refusal
Reason - Why a voter refused a candidate.
- Vote
Decision - The outcome of a voter considering a
VoteRequest. - Voting
State - Whether a member currently participates in voting.
Traits§
- Election
Transport - Cluster operations the candidate drives, injected so the state machine stays pure and deterministically testable. Production backs these onto the membership view, the per-peer vote RPC, the durable term store, and the FAILOVER handover; tests back them onto a scripted fake.
- Last
Vote Store - Durable store for a node’s last vote. The contract is narrow on purpose:
loadreturns the persisted record (or the defaultterm 0, voted_for Nonewhen nothing was ever written), andpersistmakes a record durable before the caller acknowledges a grant.
Functions§
- quorum_
threshold - Quorum threshold for a set of members: a strict majority of the voting members. Witnesses count; catching-up replicas do not.
- randomized_
election_ timeout - A randomized election timeout in
[base, base + jitter).