Skip to main content

Crate buffer_ring

Crate buffer_ring 

Source
Expand description

§Flush Buffer — Latch-Free I/O Buffer Ring

This module is intended to suit the needs of all of LLAMA’s in-memory write-staging layers. Its a fixed-size ring of on MB-aligned FlushBuffers that amortises individual page-state writes into larger, sequential I/O operations before they are dispatched to the LogStructuredStore.

§Design Goals

GoalMechanism
Latch-free writesSingle packed AtomicUsize state word per buffer
O_DIRECT compatibility4 KB-aligned allocation via Buffer::new_aligned
Amortised I/OMultiple threads fill one buffer before it is flushed
All threads participateAny thread may seal or initiate a flush

§Flush Protocol

Adapted from the LLAMA paper; all steps are performed without global locks:

  1. Identify the page state to be written.
  2. Seize space in the active FlushBuffer via reserve_space — an atomic fetch-and-add on the packed state word claims a non-overlapping byte range.
  3. Check atomically whether the reservation succeeded. If the buffer is already sealed or the space is exhausted, the buffer is sealed and the ring rotates to the next available slot.
  4. Write the payload into the reserved range while the flush-in-progress bit prevents the buffer from being dispatched to stable storage prematurely.

Though the currently implementation delegates the handling of all erroneous and invalid states to the caller, the current implementation of the Flush proceedure should lend itself well to to LLAMA flushing protocol

§State Word Layout

All per-buffer metadata is packed into a single AtomicUsize, making every state snapshot self-consistent and eliminating TOCTOU (time of check/time of use) races between the fields:

┌────────────────┬────────────────┬──────────────────┬───────────────────┬──────────┐
│  Bits 63..32   │  Bits 31..8    │  Bits 7..2       │  Bit 1            │  Bit 0   │
│  write offset  │  writer count  │  (reserved)      │  flush-in-prog    │  sealed  │
└────────────────┴────────────────┴──────────────────┴───────────────────┴──────────┘
  • write offset — next free byte position inside the backing allocation.
  • writer count — number of threads that have reserved space but not yet finished copying their payload.
  • flush-in-progress — set by whichever thread wins the CAS race to own the flush; prevents a second flush from being fired while the first is in flight.
  • sealed — set when the buffer is full or explicitly closed; prevents new reservations.

Bits 7..2 represent unused space

Re-exports§

pub use crate::flush_behaviour::QuickIO;
pub use crate::flush_behaviour::SharedAsyncFileWriter;
pub use crate::flush_behaviour::WriteMode;
pub use crate::state::State;

Modules§

flush_behaviour
QuickIO — io_uring-backed Write Dispatchers
flush_buffer_api
state

Structs§

Buffer
A 4 KB-aligned, heap-allocated byte buffer suitable for O_DIRECT I/O.
BufferRing
A fixed-size ring of FlushBuffers that amortises writes into batched sequential I/O.
FlushBuffer
A single latch-free I/O buffer.
FlushRingOptions
Options for creating BufferRing instances with custom configurations.

Enums§

BufferError
Errors that may be returned by buffer and ring operations.
BufferMsg
Successful outcomes returned by buffer and ring operations.

Constants§

FLUSH_IN_PROGRESS_BIT
Bit 1 of the state word — set while a flush is in progress.
OFFSET_SHIFT
The write-offset field occupies the top 32 bits of the state word.
ONE_MEGABYTE_BLOCK
The size of a 1 MB page
RING_SIZE
Default number of buffers in a BufferRing.
SEALED_BIT
Bit 0 of the state word — set when the buffer is closed to new writers.
WRITER_MASK
Mask covering the writer-count field (bits 8..32).

Functions§

state_offset
Extracts the current offset out of the state variable
state_sealed
Returns the sealed bit of the state variable
state_writers
Extracts the current current number of writers out of the state variable