Expand description
§Flush Buffer — Latch-Free I/O Buffer Ring
This module is intended to suit the needs of all of LLAMA’s in-memory write-staging layers.
Its a fixed-size ring of on MB-aligned FlushBuffers that amortises individual page-state
writes into larger, sequential I/O operations before they are dispatched to
the LogStructuredStore.
§Design Goals
| Goal | Mechanism |
|---|---|
| Latch-free writes | Single packed AtomicUsize state word per buffer |
O_DIRECT compatibility | 4 KB-aligned allocation via Buffer::new_aligned |
| Amortised I/O | Multiple threads fill one buffer before it is flushed |
| All threads participate | Any thread may seal or initiate a flush |
§Flush Protocol
Adapted from the LLAMA paper; all steps are performed without global locks:
- Identify the page state to be written.
- Seize space in the active
FlushBufferviareserve_space— an atomic fetch-and-add on the packed state word claims a non-overlapping byte range. - Check atomically whether the reservation succeeded. If the buffer is already sealed or the space is exhausted, the buffer is sealed and the ring rotates to the next available slot.
- Write the payload into the reserved range while the flush-in-progress bit prevents the buffer from being dispatched to stable storage prematurely.
Though the currently implementation delegates the handling of all erroneous and invalid states to the caller, the current implementation of the Flush proceedure should lend itself well to to LLAMA flushing protocol
§State Word Layout
All per-buffer metadata is packed into a single AtomicUsize, making every
state snapshot self-consistent and eliminating TOCTOU (time of check/time of use) races between the
fields:
┌────────────────┬────────────────┬──────────────────┬───────────────────┬──────────┐
│ Bits 63..32 │ Bits 31..8 │ Bits 7..2 │ Bit 1 │ Bit 0 │
│ write offset │ writer count │ (reserved) │ flush-in-prog │ sealed │
└────────────────┴────────────────┴──────────────────┴───────────────────┴──────────┘- write offset — next free byte position inside the backing allocation.
- writer count — number of threads that have reserved space but not yet finished copying their payload.
- flush-in-progress — set by whichever thread wins the CAS race to own the flush; prevents a second flush from being fired while the first is in flight.
- sealed — set when the buffer is full or explicitly closed; prevents new reservations.
Bits 7..2 represent unused space
Re-exports§
pub use crate::flush_behaviour::QuickIO;pub use crate::flush_behaviour::WriteMode;pub use crate::state::State;
Modules§
- flush_
behaviour - QuickIO —
io_uring-backed Write Dispatchers - flush_
buffer_ api - state
Structs§
- Buffer
- A 4 KB-aligned, heap-allocated byte buffer suitable for
O_DIRECTI/O. - Buffer
Ring - A fixed-size ring of
FlushBuffers that amortises writes into batched sequential I/O. - Flush
Buffer - A single latch-free I/O buffer.
- Flush
Ring Options - Options for creating
BufferRinginstances with custom configurations.
Enums§
- Buffer
Error - Errors that may be returned by buffer and ring operations.
- Buffer
Msg - Successful outcomes returned by buffer and ring operations.
Constants§
- FLUSH_
IN_ PROGRESS_ BIT - Bit 1 of the state word — set while a flush is in progress.
- OFFSET_
SHIFT - The write-offset field occupies the top 32 bits of the state word.
- ONE_
MEGABYTE_ BLOCK - The size of a 1 MB page
- RING_
SIZE - Default number of buffers in a
BufferRing. - SEALED_
BIT - Bit 0 of the state word — set when the buffer is closed to new writers.
- WRITER_
MASK - Mask covering the writer-count field (bits 8..32).
Functions§
- state_
offset - Extracts the current offset out of the state variable
- state_
sealed - Returns the sealed bit of the state variable
- state_
writers - Extracts the current current number of writers out of the state variable