Module single_node_expansion

Module single_node_expansion 

Source
Expand description

§Scale from Single-Node to 3-Node Cluster

Goal: Dynamically expand your running d-engine from 1 node to 3 nodes without downtime.


§Why Scale to 3 Nodes?

Single Node3-Node Cluster
No fault toleranceTolerates 1 node failure
1 server3 servers
If node crashes → data unavailableAuto leader re-election
No replicationData replicated across nodes

Key principle: 3 nodes = tolerate 1 failure (quorum = 2 out of 3).


§Dynamic Expansion (Zero Downtime)

Target: 1 node → 3 nodes (skip 2-node, no fault tolerance)

What happens:

  1. Node 1 running as single-node cluster (already has data)
  2. Start Node 2 → joins as Learner → syncs data → auto-promotes to Voter
  3. Start Node 3 immediately → joins and syncs (do NOT stop at 2 nodes)
  4. Result: 3-node cluster, Node 1 never restarted

Why not 2 nodes? 2-node cluster has zero fault tolerance (quorum = 2, any failure = cluster down). Always use odd numbers: 1, 3, or 5 nodes.

Example: examples/single-node-expansion/


§Prerequisites

  • Node 1 running in single-node mode
  • 2 additional servers (or terminals for local testing)
  • Network connectivity between nodes

§Step 1: Start Node 1 (Single-Node)

Node 1 config (config/n1.toml):

[cluster]
node_id = 1
listen_address = "0.0.0.0:9081"
initial_cluster = [
    { id = 1, address = "0.0.0.0:9081", role = 2, status = 2 }
]
db_root_dir = "./db"

Config field reference:

  • role = 2: Leader (NodeRole: 0=Follower, 1=Candidate, 2=Leader, 3=Learner)
  • status = 2: ACTIVE (NodeStatus: 0=JOINING, 1=SYNCING, 2=ACTIVE)

Start:

cd examples/single-node-expansion
make build
make start-node1

Expected log:

[Node 1] Follower → Candidate (term 1)
[Node 1] Candidate → Leader (term 2)

Node 1 is now leader, accepting writes.


§Step 2: Join Node 2

Node 2 config (config/n2.toml):

[cluster]
node_id = 2
listen_address = "0.0.0.0:9082"
initial_cluster = [
    { id = 1, address = "0.0.0.0:9081", role = 2, status = 2 },  # Existing leader
    { id = 2, address = "0.0.0.0:9082", role = 3, status = 0 },  # Self: Learner
]
db_root_dir = "./db"

Key fields:

  • role = 3: Learner (will auto-promote to Voter)
  • status = 0: JOINING (new node catching up with logs)

Note: status = 0 (JOINING) means this node is new and needs to sync data.
status = 2 (ACTIVE) means the node is already a formal member (like Node 1).

Why join as Learner (not Follower)?

Join MethodSafetyQuorum Impact
Learner (role=3) ✅Safe - doesn’t affect quorum during syncNone - promotes after catching up
Follower (role=2) ⚠️Risky - immediately participates in quorumHigh - can slow down writes if unstable

IMPORTANT: Always join new nodes as Learner. Joining as Follower can impact cluster availability if the new node is slow or unstable.

Start:

make join-node2

Expected log:

[Node 2] Learner → Follower (term 2)
🎊 NODE 2 PROMOTED TO VOTER!

Sync mechanism: InstallSnapshot (bulk data) + AppendEntries (incremental logs), then auto-promotes to Voter.


§Step 3: Join Node 3 (Immediately After Node 2)

IMPORTANT: Do NOT stop at 2 nodes. Start Node 3 right after Node 2.

Node 3 config (config/n3.toml):

[cluster]
node_id = 3
listen_address = "0.0.0.0:9083"
initial_cluster = [
    { id = 1, address = "0.0.0.0:9081", role = 2, status = 2 },  # Leader, ACTIVE
    { id = 2, address = "0.0.0.0:9082", role = 1, status = 2 },  # Follower (promoted), ACTIVE
    { id = 3, address = "0.0.0.0:9083", role = 3, status = 0 },  # Self: Learner, JOINING
]
db_root_dir = "./db"

Key: Node 2 listed as role = 1, status = 2 assumes it’s already promoted to Follower and ACTIVE.

Alternative (safer): If unsure about Node 2’s promotion status, use conservative config:
{ id = 2, ..., role = 3, status = 0 } - System will auto-correct if Node 2 is already promoted.

Start:

make join-node3

Result: 3-node cluster with 1-failure tolerance. Node 1 never restarted.


§Verify Cluster

Check cluster status:

# All 3 nodes should be running
ps aux | grep demo

Test replication:

  1. Write data via Node 1 (leader)
  2. Read from Node 2 or Node 3
  3. Data should be replicated

§Test Failover (Optional)

Kill current leader:

# Find leader process (check which node is leader)
ps aux | grep demo | grep 908[1-3]
kill <PID>

Expected behavior (Raft guarantees):

  • Remaining 2 nodes detect leader failure (~1s)
  • New leader elected via majority vote (2/3 quorum)
  • Cluster continues accepting writes
  • ~1-2s downtime during re-election

Restart killed node:

# If killed Node 1
make start-node1
# If killed Node 2/3, they will auto-rejoin

Node rejoins as follower, syncs missing data from new leader.


§Troubleshooting

“Node won’t join”:

  • Verify Node 1 is running and is leader
  • Check network connectivity: nc -zv 0.0.0.0 9081
  • Check logs for errors

“No leader elected”:

  • Ensure at least 2 nodes running (quorum)
  • Check logs for errors
  • Verify addresses in configs match actual IPs

§Production Deployment

For production servers, update addresses:

# Node 2 on server 192.168.1.11
[cluster]
node_id = 2
listen_address = "192.168.1.11:9082"
initial_cluster = [
    { id = 1, address = "192.168.1.10:9081", role = 2, status = 2 },
    { id = 2, address = "192.168.1.11:9082", role = 3, status = 0 },
]
db_root_dir = "./db"

Network requirements:

  • < 10ms latency between nodes
  • Allow TCP ports 9081-9083
  • Deploy across availability zones for fault tolerance

§Next Steps

  • See examples/single-node-expansion/README.md for detailed architecture
  • Review Quick Start Guide for embedded mode basics
  • Check examples/three-nodes-standalone/ for direct 3-node deployment

Created: 2025-12-03
Updated: 2025-12-25
Example: examples/single-node-expansion/