Expand description
Bootstrap probe: deterministic “who should bootstrap” decision and the live Ping probe used to confirm the elected bootstrapper is up.
§The rule
To eliminate the should_bootstrap race where multiple seeds each
saw “no other seed is up” and all bootstrapped disjoint clusters,
we use a deterministic elected-bootstrapper rule:
The seed whose
SocketAddris lexicographically smallest is the designated bootstrapper. Every other seed callsjoin().
SocketAddr has a total ordering (IPv4 octets compare before IPv6,
and ports tie-break), so every node given the same seed list agrees
on the same bootstrapper without any network round-trips — no race
is possible.
§The Ping probe
When this node is not the designated bootstrapper, we still
want to give the elected seed a short window to come up before
entering the retry-backoff loop in join(). ping_probe sends a
cheap, side-effect-free RaftRpc::Ping to the elected seed up to
MAX_PROBE_ATTEMPTS times at PROBE_INTERVAL spacing. Any
successful Pong response means the bootstrapper is alive — we
immediately return false so the caller falls through to join().
If every attempt fails we still return false (the caller’s join
loop has its own retry schedule and will handle the slow-start
case).
§The force flag
ClusterConfig.force_bootstrap is an operator escape hatch for
disaster recovery — the designated bootstrapper has been lost
permanently and the operator wants a different seed to take over.
When set, should_bootstrap returns true without probing.