raft-io 0.8.0

Raft consensus and replicated-log engine for Rust. Leader election, log replication, membership changes, and snapshotting over a pluggable transport and a pluggable log store. The consensus layer above wal-db and the coordination substrate for Hive DB clustering.
Documentation
  • Leader election — randomized-timeout election with term and vote safety; one leader per term (live)
  • Log replication — batched append-entries with per-follower progress, optimistic pipelining, conflict-hint backtracking, and commit on a quorum (live in v0.3)
  • Deterministic core — the state machine is pure and step-driven, so the whole protocol is testable without time or I/O (live)
  • Pluggable transportRaftTransport trait; in-memory for tests, real net for production (live)
  • Pluggable log storeRaftLog trait; wal-db-backed WalLog under the persistence feature (live in v0.4)
  • Crash recovery — term, vote, and log persisted before each RPC; a restarted node recovers and rejoins without violating safety (live in v0.4)
  • Snapshotting — install-snapshot for log compaction and fast follower catch-up, driven by a snapshot-policy hint (live in v0.5)
  • Typed framingpack-io wire encoding for messages under the framing feature (live in v0.5)
  • Membership changes — single-server add/remove with safe sequencing, plus leadership transfer (live in v0.6)

Installation

[dependencies]
raft-io = "0.8"

# Optional features:
raft-io = { version = "0.8", features = ["persistence"] } # durable wal-db-backed `WalLog`
raft-io = { version = "0.8", features = ["framing"] }     # pack-io wire framing for messages

Quick Start

A node is a deterministic state machine. You hand it events with step and it hands back actions to carry out. The single-node path needs nothing else — no transport, no storage to wire up:

use raft_io::{Action, Event, RaftConfig, RaftNode};

// One node, no peers: it reaches quorum (itself) the moment it times out.
let mut node = RaftNode::new(RaftConfig::single(1));

// Drive logical ticks until it elects itself leader.
while !node.is_leader() {
    let _ = node.step(Event::Tick).expect("tick never fails in memory");
}
assert_eq!(node.leader(), Some(1));

// A leader commits its own proposals immediately (quorum of one).
for action in node.step(Event::Propose(b"set x = 1".to_vec())).unwrap() {
    if let Action::Apply { index, command, .. } = action {
        // hand `command` to your state machine, in log order
        assert_eq!(index, 1);
        assert_eq!(command, b"set x = 1");
    }
}
assert_eq!(node.commit_index(), 1);

A multi-node cluster works the same way: you route each Action::Send to the target node's step through a transport of your choosing, and feed every node logical ticks. The protocol is sans-I/O — when to tick and how to deliver messages are yours to decide, which is what makes the whole thing testable without a clock or a network.

Runnable examples show each path end to end:

cargo run --example single_node         # elect + propose + apply, one node
cargo run --example in_memory_cluster   # a 3-node cluster electing a leader
cargo run --example replicated_log      # propose + replicate; all nodes agree
cargo run --example partition_recovery  # minority stalls, majority commits, heal
cargo run --example snapshot_catchup    # leader compacts; lagging node catches up via snapshot
cargo run --example membership          # add a node, remove a node, transfer leadership
cargo run --example kv_store            # a replicated key-value store, end to end
cargo run --example persistent_node --features persistence  # log survives a restart

Status

This is v0.8.0: alpha — feature complete, hardened, in consumer integration. The full protocol — election (with pre-vote disruption protection), replication, durable crash recovery, snapshots, membership changes, and leadership transfer — is in place. A kitchen-sink adversarial test suite drives clusters through combined partitions, message loss/reorder/duplication, membership churn, and snapshotting, and asserts all five Raft safety properties (Election Safety, Leader Append-Only, Log Matching, Leader Completeness, State Machine Safety) continuously over sustained runs; a companion suite drives a replicated key-value store — the library's first real consumer — to identical state on every node under the same faults; and the decode path is fuzzed. The public traits and the wire and WAL formats are frozen — see the normative docs/PROTOCOL.md; additions in the 0.x line stay MINOR-compatible (the pre-vote messages are new #[non_exhaustive] enum variants that change no existing encoding). Beta and RC hardening follow toward the 1.0 freeze, per the ROADMAP (development copy). The full public surface is documented in docs/API.md.

Where It Fits

raft-io is the consensus engine. It is consumed by:

  • wal-db — durable Raft log persistence (under persistence)
  • pack-io — typed RPC message framing (under framing)
  • Hive DB — cluster coordination and replicated metadata

It stays foreign-compatible: usable standalone in any system that needs replicated, fault-tolerant state.

Cross-Platform Support

Tier 1 Support:

  • Linux (x86_64, aarch64)
  • macOS (x86_64, Apple Silicon)
  • Windows (x86_64)

Behavior is verified on each target by the CI matrix.

Contributing

Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.