# amaters-cluster
Consensus layer for AmateRS (Ukehi - The Sacred Pledge)
[](https://github.com/cool-japan/amaters)
[](LICENSE)
[](Cargo.toml)
## Overview
`amaters-cluster` implements distributed consensus and cluster management for AmateRS using the **Ukehi** component. It provides a complete Raft consensus implementation with joint consensus membership changes, a batch-apply state machine with snapshotting, consistent hashing for data partitioning, and full node lifecycle management.
**Status**: Alpha — 257 tests, 245 public items.
## Implemented Features
### Raft Consensus
A complete, from-scratch Raft consensus implementation:
- **Leader election** — randomized election timeouts, vote request and grant logic, term management
- **Log replication** — AppendEntries RPC, log consistency checks via prev_log_index/term, quorum-based commit index advancement
- **Joint consensus** — safe cluster membership changes using the two-phase joint consensus protocol (C_old,new → C_new)
- **Safety guarantees** — election safety (at most one leader per term), leader append-only, log matching, state machine safety
### State Machine
- Batch apply of committed log entries for throughput efficiency
- Pluggable state machine interface for application-defined command execution
- Snapshotting support: create, store, and restore snapshots to compact the Raft log
### Consistent Hashing Partitioner
- Virtual node (vnodes) consistent hash ring for even key distribution
- Minimal key movement when adding or removing nodes
- Configurable replication factor
### Snapshot Management
- Snapshot creation triggered by configurable log size thresholds
- Snapshot storage and retrieval
- Snapshot transfer to joining or lagging followers
- Log truncation after successful snapshot
### Write-Ahead Log (WAL v2)
- WAL v2 format (magic `0x57414C32`) with per-entry 8-byte fencing token in the entry header
- Backward-compatible WAL v1 read path for rolling upgrades
- CRC32 integrity verification on every entry read
- `CorruptionPolicy` applied on CRC mismatch: `TruncateToLastGood` (default), `RefuseStart`, or `AlertAndContinue`
- WAL replay on `Node::start()` — committed entries replayed into the state machine before accepting RPCs; RPC handlers reject requests while `is_recovering` is set
### Fencing Tokens
- `FencingToken` — packed `u64` with term in high 32 bits and sequence in low 32 bits, backed by `AtomicU64` for lock-free access
- `new(term, seq)`, `term()`, `seq()`, `bump_seq()`, `new_leader_term()` constructors/helpers
- `FencingTokenState` in the cluster state — `issue_token()` stamps each write; `bump_term_token()` resets sequence on leadership change
- Storage layer rejects writes carrying a stale token, preventing split-brain writes
- Token embedded in every WAL v2 entry header so it survives restarts
### Node Management and Membership Changes
- Node lifecycle: start, stop, step-down, transfer leadership
- Dynamic membership changes via joint consensus
- Add and remove peers without cluster downtime
- Membership configuration persisted in the Raft log
## Architecture
```
Cluster (Ukehi)
┌────────────────────────────────────┐
│ Raft Consensus Engine │
│ ├── Leader Election │
│ ├── Log Replication │
│ ├── Joint Consensus Membership │
│ └── Snapshot Management │
│ │
┌────┴────┐ ┌────────┐ ┌────────────┴─┐
│ Leader │ │Follower│ │ Follower │
│ Node 1 │←──→│ Node 2 │←──→│ Node 3 │
└────┬────┘ └────────┘ └──────────────┘
│
┌────▼──────────────────────────────────────┐
│ State Machine (Batch Apply) │
│ ├── Command Execution │
│ ├── Snapshot Creation / Restoration │
│ └── Consistent Hash Partitioner │
└───────────────────────────────────────────┘
```
## Raft Properties
### Safety
- **Election Safety**: At most one leader elected per term
- **Leader Append-Only**: Log entries are never deleted from a leader
- **Log Matching**: If two logs have an entry with the same index and term, all preceding entries are identical
- **State Machine Safety**: All nodes apply the same commands in the same order
### Liveness
- **Eventual Leader Election**: A new leader is elected within the configured election timeout
- **Progress**: The cluster makes progress when a majority of nodes are available
### Fault Tolerance
| 3 nodes | 1 | 2 |
| 5 nodes | 2 | 3 |
| 7 nodes | 3 | 4 |
Formula: Quorum = floor(N / 2) + 1
## Usage
```rust
use amaters_cluster::{RaftNode, RaftConfig, StateMachine};
let config = RaftConfig {
node_id: "node-1".into(),
peers: vec!["node-2:7878".into(), "node-3:7878".into()],
election_timeout_min_ms: 150,
election_timeout_max_ms: 300,
heartbeat_interval_ms: 50,
snapshot_threshold: 10_000,
..Default::default()
};
let state_machine = MyStateMachine::new();
let node = RaftNode::new(config, state_machine).await?;
node.start().await?;
// Propose a command (leader only)
if node.is_leader().await {
node.propose(command_bytes).await?;
}
// Membership change
node.add_peer("node-4:7878").await?;
```
## Consistent Hashing
```rust
use amaters_cluster::ConsistentHashPartitioner;
let mut ring = ConsistentHashPartitioner::new(150); // 150 virtual nodes per peer
ring.add_node("node-1");
ring.add_node("node-2");
ring.add_node("node-3");
let responsible_node = ring.get_node(b"my-document-key")?;
```
## Testing
```bash
# Run all tests (257 total)
cargo nextest run --all-features
# Unit tests only
cargo test
```
## Dependencies
- `amaters-core` — core types and storage interfaces
- `amaters-net` — network communication for Raft RPCs
- `tokio` — async runtime
## License
Licensed under Apache-2.0
## Authors
**COOLJAPAN OU (Team KitaSan)**