amaters-cluster 0.2.0

Consensus layer for AmateRS (Ukehi)
Documentation

amaters-cluster

Consensus layer for AmateRS (Ukehi - The Sacred Pledge)

Alpha License: Apache-2.0 Version: 0.2.0

Overview

amaters-cluster implements distributed consensus and cluster management for AmateRS using the Ukehi component. It provides a complete Raft consensus implementation with joint consensus membership changes, a batch-apply state machine with snapshotting, consistent hashing for data partitioning, and full node lifecycle management.

Status: Alpha — 257 tests, 245 public items.

Implemented Features

Raft Consensus

A complete, from-scratch Raft consensus implementation:

  • Leader election — randomized election timeouts, vote request and grant logic, term management
  • Log replication — AppendEntries RPC, log consistency checks via prev_log_index/term, quorum-based commit index advancement
  • Joint consensus — safe cluster membership changes using the two-phase joint consensus protocol (C_old,new → C_new)
  • Safety guarantees — election safety (at most one leader per term), leader append-only, log matching, state machine safety

State Machine

  • Batch apply of committed log entries for throughput efficiency
  • Pluggable state machine interface for application-defined command execution
  • Snapshotting support: create, store, and restore snapshots to compact the Raft log

Consistent Hashing Partitioner

  • Virtual node (vnodes) consistent hash ring for even key distribution
  • Minimal key movement when adding or removing nodes
  • Configurable replication factor

Snapshot Management

  • Snapshot creation triggered by configurable log size thresholds
  • Snapshot storage and retrieval
  • Snapshot transfer to joining or lagging followers
  • Log truncation after successful snapshot

Write-Ahead Log (WAL v2)

  • WAL v2 format (magic 0x57414C32) with per-entry 8-byte fencing token in the entry header
  • Backward-compatible WAL v1 read path for rolling upgrades
  • CRC32 integrity verification on every entry read
  • CorruptionPolicy applied on CRC mismatch: TruncateToLastGood (default), RefuseStart, or AlertAndContinue
  • WAL replay on Node::start() — committed entries replayed into the state machine before accepting RPCs; RPC handlers reject requests while is_recovering is set

Fencing Tokens

  • FencingToken — packed u64 with term in high 32 bits and sequence in low 32 bits, backed by AtomicU64 for lock-free access
  • new(term, seq), term(), seq(), bump_seq(), new_leader_term() constructors/helpers
  • FencingTokenState in the cluster state — issue_token() stamps each write; bump_term_token() resets sequence on leadership change
  • Storage layer rejects writes carrying a stale token, preventing split-brain writes
  • Token embedded in every WAL v2 entry header so it survives restarts

Node Management and Membership Changes

  • Node lifecycle: start, stop, step-down, transfer leadership
  • Dynamic membership changes via joint consensus
  • Add and remove peers without cluster downtime
  • Membership configuration persisted in the Raft log

Architecture

                    Cluster (Ukehi)
         ┌────────────────────────────────────┐
         │  Raft Consensus Engine             │
         │  ├── Leader Election               │
         │  ├── Log Replication               │
         │  ├── Joint Consensus Membership    │
         │  └── Snapshot Management           │
         │                                    │
    ┌────┴────┐    ┌────────┐    ┌────────────┴─┐
    │ Leader  │    │Follower│    │  Follower    │
    │ Node 1  │←──→│ Node 2 │←──→│  Node 3      │
    └────┬────┘    └────────┘    └──────────────┘
         │
    ┌────▼──────────────────────────────────────┐
    │  State Machine (Batch Apply)               │
    │  ├── Command Execution                     │
    │  ├── Snapshot Creation / Restoration       │
    │  └── Consistent Hash Partitioner           │
    └───────────────────────────────────────────┘

Raft Properties

Safety

  • Election Safety: At most one leader elected per term
  • Leader Append-Only: Log entries are never deleted from a leader
  • Log Matching: If two logs have an entry with the same index and term, all preceding entries are identical
  • State Machine Safety: All nodes apply the same commands in the same order

Liveness

  • Eventual Leader Election: A new leader is elected within the configured election timeout
  • Progress: The cluster makes progress when a majority of nodes are available

Fault Tolerance

Cluster Size Max Node Failures Quorum Required
3 nodes 1 2
5 nodes 2 3
7 nodes 3 4

Formula: Quorum = floor(N / 2) + 1

Usage

use amaters_cluster::{RaftNode, RaftConfig, StateMachine};

let config = RaftConfig {
    node_id: "node-1".into(),
    peers: vec!["node-2:7878".into(), "node-3:7878".into()],
    election_timeout_min_ms: 150,
    election_timeout_max_ms: 300,
    heartbeat_interval_ms: 50,
    snapshot_threshold: 10_000,
    ..Default::default()
};

let state_machine = MyStateMachine::new();
let node = RaftNode::new(config, state_machine).await?;
node.start().await?;

// Propose a command (leader only)
if node.is_leader().await {
    node.propose(command_bytes).await?;
}

// Membership change
node.add_peer("node-4:7878").await?;

Consistent Hashing

use amaters_cluster::ConsistentHashPartitioner;

let mut ring = ConsistentHashPartitioner::new(150); // 150 virtual nodes per peer
ring.add_node("node-1");
ring.add_node("node-2");
ring.add_node("node-3");

let responsible_node = ring.get_node(b"my-document-key")?;

Testing

# Run all tests (257 total)
cargo nextest run --all-features

# Unit tests only
cargo test

Dependencies

  • amaters-core — core types and storage interfaces
  • amaters-net — network communication for Raft RPCs
  • tokio — async runtime

License

Licensed under Apache-2.0

Authors

COOLJAPAN OU (Team KitaSan)