PersistenceStrategy

Enum PersistenceStrategy 

Source
pub enum PersistenceStrategy {
    DiskFirst,
    MemFirst,
}
Expand description

Defines how Raft log entries are persisted and accessed.

All strategies use a configurable FlushPolicy to control when memory contents are flushed to disk, affecting write latency and durability guarantees.

Note: Both strategies now fully load all log entries from disk into memory at startup. The in-memory SkipMap serves as the primary data structure for reads in all modes.

§Raft Log Persistence Architecture

This document outlines the design and behavior of the Raft log storage engine. It explains how logs are persisted, how the system handles reads and writes under different strategies, and how consistency is guaranteed across different configurations.


§Overview

Raft logs record the sequence of operations that must be replicated across all nodes in a Raft cluster. Correct and reliable storage of these logs is essential to maintaining the linearizability and safety guarantees of the protocol.

Our log engine supports two persistence strategies:

  • DiskFirst: Prioritizes durability.
  • MemFirst: Prioritizes performance.

Both strategies support configurable flush policies to control how memory contents are persisted to disk.


§Persistence Strategies

§DiskFirst

  • Write Path: On append, entries are first synchronously written to disk. Once confirmed, they are cached in memory.
  • Read Path: Reads are served from memory. If the requested entry is missing, it is loaded from disk and cached.
  • Startup Behavior: Does not preload all entries from disk into memory. Instead, entries are loaded lazily on access.
  • Durability: Ensures strong durability. A log is never considered accepted until it is safely written to disk.
  • Memory Use: Memory acts as a read-through cache for performance optimization.

§MemFirst

  • Write Path: Entries are first written to memory and acknowledged immediately. Disk persistence is handled asynchronously in the background.
  • Read Path: Reads are served from memory only. If an entry is not present in memory, it is considered nonexistent.
  • Startup Behavior: Loads all log entries from disk into memory during startup.
  • Durability: Durability is best-effort and depends on the flush policy. Recent entries may be lost if a crash occurs before flushing.
  • Memory Use: Memory holds the complete working set of logs.

§Flush Policies

Flush policies control how and when in-memory data is persisted to disk. These are especially relevant in MemFirst mode, but are also applied in DiskFirst to control how memory state is flushed (e.g., snapshots, metadata, etc).

§Types

  • Immediate

    • Flush to disk immediately after every log write.
    • Ensures maximum durability, but higher I/O latency.
  • Batch { threshold, interval }

    • Flush to disk when:
      • The number of unflushed entries exceeds threshold, or
      • The elapsed time since last flush exceeds interval milliseconds.
    • Balances performance and durability.
    • May lose recent entries on crash.

§Read & Write Semantics

OperationDiskFirstMemFirst
WriteWrite to disk → cache in memoryWrite to memory → async flush
ReadFrom memory; fallback to diskMemory only; missing = absent
StartupLazy-loading on accessPreload all entries into memory
FlushControlled via flush_policyControlled via flush_policy
Data loss on crashNo (after disk fsync)Possible if not flushed

§Consistency Guarantees

PropertyDiskFirstMemFirst
Linearizability✅ (strict)✅ (with quorum + sync on commit)
Durability (Post-Commit)✅ Always❌ Depends on flush policy
Availability (Under Load)❌ Lower✅ Higher
Crash Recovery✅ Strong❌ Recent entries may be lost
Startup Readiness✅ Fast❌ Slower (full load)

StrategyBest For
DiskFirstSystems that require strong durability and consistent recovery (e.g., databases, distributed ledgers)
MemFirstSystems that favor latency and availability, and can tolerate recovery from snapshots or re-election (e.g., in-memory caches, ephemeral workloads)

§Developer Notes

  • Log Truncation & Compaction: Logs should be truncated after snapshotting, regardless of strategy.
  • Backpressure: In MemFirst, developers should implement backpressure if memory usage exceeds thresholds.
  • Lazy Loading: In DiskFirst, avoid head-of-line blocking by prefetching future entries when cache misses occur.
  • Flush Daemon: Use a background task to monitor and enforce flush policy under MemFirst.

§Future Improvements

  • Snapshot-aware recovery to reduce startup times for MemFirst.
  • Tiered storage support (e.g., WAL on SSD, archival on HDD or cloud).
  • Intelligent adaptive flush control based on workload.

Variants§

§

DiskFirst

Disk-first persistence strategy.

  • Write path: On append, the log entry is first written to disk. Only after a successful disk write is it acknowledged and stored in the in-memory SkipMap.

  • Read path: Reads are always served from the in-memory SkipMap.

  • Startup behavior: All log entries are loaded from disk into memory at startup, ensuring consistent access speed regardless of disk state.

  • Suitable for systems prioritizing strong durability while still providing in-memory performance for reads.

§

MemFirst

Memory-first persistence strategy.

  • Write path: On append, the log entry is first written to the in-memory SkipMap and acknowledged immediately. Disk persistence happens asynchronously in the background, governed by FlushPolicy.

  • Read path: Reads are always served from the in-memory SkipMap.

  • Startup behavior: All log entries are loaded from disk into memory at startup, the same as DiskFirst.

  • Suitable for systems that favor lower write latency and faster failover, while still retaining a disk-backed log for crash recovery.

Trait Implementations§

Source§

impl Clone for PersistenceStrategy

Source§

fn clone(&self) -> PersistenceStrategy

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for PersistenceStrategy

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for PersistenceStrategy

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for PersistenceStrategy

Source§

fn eq(&self, other: &PersistenceStrategy) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for PersistenceStrategy

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl StructuralPartialEq for PersistenceStrategy

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,