amaters-cluster
Consensus layer for AmateRS (Ukehi - The Sacred Pledge)
Overview
amaters-cluster implements distributed consensus and cluster management for AmateRS using the Ukehi component. It provides a complete Raft consensus implementation with joint consensus membership changes, a batch-apply state machine with snapshotting, consistent hashing for data partitioning, and full node lifecycle management.
Status: Alpha — 257 tests, 245 public items.
Implemented Features
Raft Consensus
A complete, from-scratch Raft consensus implementation:
- Leader election — randomized election timeouts, vote request and grant logic, term management
- Log replication — AppendEntries RPC, log consistency checks via prev_log_index/term, quorum-based commit index advancement
- Joint consensus — safe cluster membership changes using the two-phase joint consensus protocol (C_old,new → C_new)
- Safety guarantees — election safety (at most one leader per term), leader append-only, log matching, state machine safety
State Machine
- Batch apply of committed log entries for throughput efficiency
- Pluggable state machine interface for application-defined command execution
- Snapshotting support: create, store, and restore snapshots to compact the Raft log
Consistent Hashing Partitioner
- Virtual node (vnodes) consistent hash ring for even key distribution
- Minimal key movement when adding or removing nodes
- Configurable replication factor
Snapshot Management
- Snapshot creation triggered by configurable log size thresholds
- Snapshot storage and retrieval
- Snapshot transfer to joining or lagging followers
- Log truncation after successful snapshot
Write-Ahead Log (WAL v2)
- WAL v2 format (magic
0x57414C32) with per-entry 8-byte fencing token in the entry header - Backward-compatible WAL v1 read path for rolling upgrades
- CRC32 integrity verification on every entry read
CorruptionPolicyapplied on CRC mismatch:TruncateToLastGood(default),RefuseStart, orAlertAndContinue- WAL replay on
Node::start()— committed entries replayed into the state machine before accepting RPCs; RPC handlers reject requests whileis_recoveringis set
Fencing Tokens
FencingToken— packedu64with term in high 32 bits and sequence in low 32 bits, backed byAtomicU64for lock-free accessnew(term, seq),term(),seq(),bump_seq(),new_leader_term()constructors/helpersFencingTokenStatein the cluster state —issue_token()stamps each write;bump_term_token()resets sequence on leadership change- Storage layer rejects writes carrying a stale token, preventing split-brain writes
- Token embedded in every WAL v2 entry header so it survives restarts
Node Management and Membership Changes
- Node lifecycle: start, stop, step-down, transfer leadership
- Dynamic membership changes via joint consensus
- Add and remove peers without cluster downtime
- Membership configuration persisted in the Raft log
Architecture
Cluster (Ukehi)
┌────────────────────────────────────┐
│ Raft Consensus Engine │
│ ├── Leader Election │
│ ├── Log Replication │
│ ├── Joint Consensus Membership │
│ └── Snapshot Management │
│ │
┌────┴────┐ ┌────────┐ ┌────────────┴─┐
│ Leader │ │Follower│ │ Follower │
│ Node 1 │←──→│ Node 2 │←──→│ Node 3 │
└────┬────┘ └────────┘ └──────────────┘
│
┌────▼──────────────────────────────────────┐
│ State Machine (Batch Apply) │
│ ├── Command Execution │
│ ├── Snapshot Creation / Restoration │
│ └── Consistent Hash Partitioner │
└───────────────────────────────────────────┘
Raft Properties
Safety
- Election Safety: At most one leader elected per term
- Leader Append-Only: Log entries are never deleted from a leader
- Log Matching: If two logs have an entry with the same index and term, all preceding entries are identical
- State Machine Safety: All nodes apply the same commands in the same order
Liveness
- Eventual Leader Election: A new leader is elected within the configured election timeout
- Progress: The cluster makes progress when a majority of nodes are available
Fault Tolerance
| Cluster Size | Max Node Failures | Quorum Required |
|---|---|---|
| 3 nodes | 1 | 2 |
| 5 nodes | 2 | 3 |
| 7 nodes | 3 | 4 |
Formula: Quorum = floor(N / 2) + 1
Usage
use ;
let config = RaftConfig ;
let state_machine = new;
let node = new.await?;
node.start.await?;
// Propose a command (leader only)
if node.is_leader.await
// Membership change
node.add_peer.await?;
Consistent Hashing
use ConsistentHashPartitioner;
let mut ring = new; // 150 virtual nodes per peer
ring.add_node;
ring.add_node;
ring.add_node;
let responsible_node = ring.get_node?;
Testing
# Run all tests (257 total)
# Unit tests only
Dependencies
amaters-core— core types and storage interfacesamaters-net— network communication for Raft RPCstokio— async runtime
License
Licensed under Apache-2.0
Authors
COOLJAPAN OU (Team KitaSan)