Skip to main content

Module election

Module election 

Source
Expand description

Term-based, quorum-gated automatic election (issue #834, PRD #819, ADR 0030).

This is the consensus core that turns a primary loss into an automatic, safe promotion. It lives in the first-party-but-decoupled control-plane supervisor (ADR 0030) — distinct from the data path — and reuses the two pieces the rest of replication already built:

  • the commit watermark (super::commit_waiter / super::quorum) — the highest LSN durably replicated to a quorum that intersects every possible election majority. Nothing at or below it may ever be rolled back; and
  • the FAILOVER handover machinery (super::failover) — once a candidate wins, promotion is driven through the same coordinated role-swap, not a parallel state machine.

§The five hard requirements (ADR 0030, issue #834)

  1. Dry-run probe. A candidate first asks “would you vote for me?” without bumping any term. Only a real election bumps the term. This keeps a flapping candidate from burning through terms and lets the supervisor probe liveness cheaply.
  2. Durable last-vote. A voter persists (term, voted_for) before acknowledging a grant, so a voter that crashes and restarts mid-term never double-votes — the second request in the same term for a different candidate is refused from disk.
  3. Watermark vote rule (the safety core). A voter MUST refuse any candidate whose log does not cover the commit watermark. An acknowledged synchronous write sits at or below the watermark, so a winner necessarily carries it — the write provably survives the failover. This is the one rule that may not be relaxed.
  4. Randomized election timeouts. Candidates wait a randomized interval before standing, so split votes are rare and self-correcting.
  5. Membership rules. A quorum is a majority of voting members. Witness members (#836) hold no data but vote, so 2 data + 1 witness is a valid HA shape. A catching-up replica is non-voting until it reaches a healthy state — it neither votes nor stands.

§No two primaries in a term

This invariant is structural, not probabilistic:

  • a win requires a strict majority of voting members, and two strict majorities of the same set always intersect; and
  • the shared voter in any two majorities votes at most once per term (durable last-vote), so it cannot grant two different candidates the same term.

Therefore at most one candidate can collect a majority in a given term, even under an arbitrary network partition. The partition tests exercise exactly this.

§Module shape

Like super::failover, the candidate-side ElectionCoordinator::run is a pure state machine: the clock, the per-peer vote RPC, the durable term bump, and the promotion are injected behind ElectionTransport, so the whole election is exercised deterministically with a scripted fake — no clock, no network, no engine. The voter-side Voter wraps a LastVoteStore (durable on disk in production, in-memory in tests) and applies the vote rule.

Structs§

ElectionCoordinator
The quorum-gated election state machine.
ElectionRequest
A request to run an election on behalf of candidate.
FileLastVoteStore
LastVote
A node’s durable voting record: the highest term it has participated in and who, if anyone, it granted that term. Persisted so a restart cannot erase the fact that a vote was already cast (requirement 2).
Member
A cluster member as seen by the supervisor’s membership view.
MemoryLastVoteStore
In-memory last-vote store for tests and witnesses that do not need cross-restart durability. (A witness should still persist in production; the file store is used there.)
VoteRequest
A request for a vote, sent by a candidate to a voter.
Voter
A voting member. Wraps the durable LastVoteStore and applies the vote rule. The voter is the seat of correctness: the watermark rule and the durable double-vote guard both live here.

Enums§

ElectionOutcome
The result of an election attempt.
LastVoteError
MemberKind
Whether a member holds data (and can therefore be promoted to primary) or is a vote-only witness (ADR 0030 — “a node that runs only the supervisor module”).
RefusalReason
Why a voter refused a candidate.
VoteDecision
The outcome of a voter considering a VoteRequest.
VotingState
Whether a member currently participates in voting.

Traits§

ElectionTransport
Cluster operations the candidate drives, injected so the state machine stays pure and deterministically testable. Production backs these onto the membership view, the per-peer vote RPC, the durable term store, and the FAILOVER handover; tests back them onto a scripted fake.
LastVoteStore
Durable store for a node’s last vote. The contract is narrow on purpose: load returns the persisted record (or the default term 0, voted_for None when nothing was ever written), and persist makes a record durable before the caller acknowledges a grant.

Functions§

quorum_threshold
Quorum threshold for a set of members: a strict majority of the voting members. Witnesses count; catching-up replicas do not.
randomized_election_timeout
A randomized election timeout in [base, base + jitter).