yggr
A Raft library in Rust. Four crates: a pure protocol engine, a deterministic simulator, a tokio runtime, and an example KV service.
use ;
let config = new;
let storage = open.await?;
let transport = start.await?;
let node = start.await?;
// Replicate a command. Returns when it has committed and applied.
let response = node.propose.await?;
// Linearizable read (ReadIndex). No log append, no fsync.
let value = node.read_linearizable.await?;
Crates
yggr-coreis the protocol. One type,Engine<C>, one method:step(Event<C>) -> Vec<Action<C>>. No sockets, no disk, no async.yggr-simis a deterministic cluster simulator. It drives drops, reorderings, partitions, crashes, and partial fsync against the engine and checks safety invariants after every step.yggris the tokio runtime. It providesNode,DiskStorage,TcpTransport, and a length-prefixed protobuf wire format.yggr-exampleshas a three-node replicated KV service.
The split exists so the engine is usable without the runtime, and so the simulator can run the engine deterministically without tokio or real I/O.
What's implemented
- Leader election, log replication, single-server membership changes (§4.3)
- §9.6 Pre-vote, on by default — flapping nodes can't force the rest of the cluster to step down
- Linearizable reads via ReadIndex (§8), with an opt-in §9 leader-lease fast path that skips the heartbeat round when the leader has a fresh majority ack
- Leadership transfer via
TimeoutNow - Snapshotting: chunked
InstallSnapshot, applied-entries and live-log guardrails (max_log_entries), and non-blockingsnapshot()— a slow serializer cannot stall the driver - Fallible
StateMachine::snapshotso transient disk-space errors retry cleanly - Segmented on-disk log with atomic file writes and crash-mid-rename cleanup
- State machine apply on its own task; slow
apply()does not stall heartbeats - Opt-in proposal batching on the leader
- Pull-model metrics via
Node::metrics()— counters for elections, replication, reads, and snapshots plus the usual gauges - Structured
tracingspans and events with stable field names. OpenTelemetry wiring is a few lines in yourmain; see the observability guide.
Quick start
Requires Rust 1.85+ and protoc.
# Three-node KV demo.
# Workspace checks.
Testing
The library is tested in four layers.
- Unit tests. Pure-engine tests covering every
Event/Actionpath. - Property tests. 1024-case proptests on term and commit monotonicity, vote uniqueness, and recover idempotence. Scheduled CI dials the engine invariant cases up further. Wire-format decoders are fuzzed through
Message::try_fromfor arbitrary bytes. - Sim chaos. 128 seeds, 1500 steps, on 3-, 5-, and 7-node clusters, under drops, reorderings, partitions, crashes, and partial fsync. Election Safety, Log Matching, Leader Completeness, and State Machine Safety are checked after every step.
- Runtime chaos. Real
Nodeinstances (driver task, apply task, storage) connected through an in-process chaos transport. PRs run a 16-case smoke sweep; scheduled CI runs a heavier seeded sweep. The same safety invariants are asserted at the full-stack level.
CI also enforces a workspace line-coverage floor, runs diff-scoped cargo-mutants smoke on PRs with full scheduled sweeps on the correctness-critical crates, and keeps scheduled cargo-fuzz corpus-building jobs for the wire codec, the engine, and disk recovery.
Status
Pre-1.0. The public API is stable enough to build on, but expect changes.
License
MIT. See LICENSE.md.