Skip to main content

DurableStorage

Struct DurableStorage 

Source
pub struct DurableStorage { /* private fields */ }
Expand description

Durable storage engine with full ACID support

Implementations§

Source§

impl DurableStorage

Source

pub fn open<P: AsRef<Path>>(path: P) -> Result<Self>

Open or create durable storage at path

Source

pub fn open_with_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, ) -> Result<Self>

Open with configurable ordered index

When enable_ordered_index is false, saves ~134 ns/op on writes but scan_prefix becomes O(N) instead of O(log N + K)

Source

pub fn open_with_arena<P: AsRef<Path>>(path: P) -> Result<Self>

Open with arena-backed memtable for write-heavy workloads

Uses ArenaMvccMemTable which reduces per-write allocations from 3 to 1. Best for workloads with:

  • High write throughput
  • Large keys (reduces allocation overhead)
  • Minimal concurrent reads during writes
Source

pub fn open_with_full_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, memtable_type: MemTableType, ) -> Result<Self>

Open with full configuration options

§Arguments
  • path - Storage directory path
  • enable_ordered_index - Enable ordered index for O(log N) scans
  • memtable_type - Type of memtable to use (Standard or Arena)
§Locking

Acquires an exclusive advisory lock on the database directory. This prevents concurrent multi-process access which would corrupt data. If another process has the database open, returns Err(DatabaseLocked).

Source

pub fn open_ephemeral() -> Result<EphemeralHandle>

Open an ephemeral (in-memory-like) DurableStorage backed by a temp directory.

Uses the full DurableStorage engine (WAL, MVCC, SSI) but writes to a temporary directory that is automatically cleaned up when the EphemeralHandle is dropped. This ensures test and production code paths are identical — bugs found in tests are guaranteed to reproduce in production.

§Returns

An EphemeralHandle that owns both the storage and the temp directory. Access the storage via handle.storage() or Deref coercion.

§Example
let handle = DurableStorage::open_ephemeral()?;
let txn = handle.begin_transaction()?;
handle.write(txn, b"key".to_vec(), b"value".to_vec())?;
handle.commit(txn)?;
// temp directory cleaned up when `handle` drops
Source

pub fn open_ephemeral_with_group_commit() -> Result<EphemeralHandle>

Open an ephemeral DurableStorage with group commit enabled.

Same as open_ephemeral() but with group commit for higher throughput.

Source

pub fn open_with_group_commit<P: AsRef<Path>>(path: P) -> Result<Self>

Open with group commit enabled

Source

pub fn open_with_group_commit_and_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, ) -> Result<Self>

Open with group commit and configurable ordered index

Source

pub fn open_with_policy<P: AsRef<Path>>( path: P, policy: IndexPolicy, group_commit: bool, ) -> Result<Self>

Open with IndexPolicy for automatic memtable/index configuration

This is the recommended constructor for new code. The policy determines:

  • Whether to use ordered index (ScanOptimized only)
  • Whether to use arena-backed memtable (WriteOptimized, AppendOnly)
  • Default settings optimized for the workload pattern
§Arguments
  • path - Storage directory path
  • policy - Index policy determining write/scan tradeoffs
  • group_commit - Whether to enable group commit for throughput
Source

pub fn open_for_concurrent<P: AsRef<Path>>( path: P, policy: IndexPolicy, ) -> Result<Self>

Open storage for concurrent mode (multi-reader, single-writer)

This method opens the storage WITHOUT acquiring the exclusive file lock. Coordination is handled by the concurrent MVCC layer instead.

§Safety

This must ONLY be called from Database::open_concurrent() which manages the concurrent MVCC coordination. Direct use will cause data corruption.

Source

pub fn memtable_type(&self) -> MemTableType

Get the memtable type being used

Source

pub fn recover(&self) -> Result<RecoveryStats>

Perform crash recovery

Source

pub fn begin_transaction(&self) -> Result<u64>

Begin a new transaction

Source

pub fn begin_with_mode(&self, mode: TransactionMode) -> Result<u64>

Begin a transaction with a specific mode (ReadOnly/WriteOnly/ReadWrite)

This enables mode-aware optimizations:

  • ReadOnly: Skip SSI tracking, 2.6x faster reads
  • WriteOnly: Skip read tracking, faster bulk inserts
  • ReadWrite: Full SSI for serializable isolation
Source

pub fn begin_read_only_fast(&self) -> u64

Begin a read-only transaction without any WAL records.

This is a performance-critical optimization that eliminates two WAL mutex acquisitions per read (TxnBegin + TxnAbort). Since read-only transactions have no state to recover, WAL records are unnecessary.

Callers MUST use abort_read_only_fast() to clean up.

Source

pub fn abort_read_only_fast(&self, txn_id: u64)

Abort a fast read-only transaction.

O(1) cleanup: only removes MVCC state. No WAL write, no memtable scan.

Source

pub fn read_latest(&self, key: &[u8]) -> Option<Vec<u8>>

Read a key WITHOUT any MVCC transaction tracking.

Uses the current global timestamp to see all committed writes. Bypasses: begin/abort, active_txns DashMap, record_read, stats. Only safe for single-threaded access (no concurrent writes).

Source

pub fn scan_latest(&self, prefix: &[u8]) -> Vec<(Vec<u8>, Vec<u8>)>

Scan keys with a prefix WITHOUT any MVCC transaction tracking.

Uses the current global timestamp. Only safe for single-threaded access.

Source

pub fn read(&self, txn_id: u64, key: &[u8]) -> Result<Option<Vec<u8>>>

Read a key within a transaction

Source

pub fn write(&self, txn_id: u64, key: Vec<u8>, value: Vec<u8>) -> Result<()>

Write a key-value pair within a transaction

Writes are buffered and only flushed to disk on commit. This provides ~10× better throughput for batched inserts.

Source

pub fn write_refs(&self, txn_id: u64, key: &[u8], value: &[u8]) -> Result<()>

Write from references - zero allocation hot path

Avoids cloning key/value by writing to WAL from refs directly, then only allocating once for memtable storage.

Source

pub fn delete(&self, txn_id: u64, key: Vec<u8>) -> Result<()>

Delete a key within a transaction

Source

pub fn write_batch_refs( &self, txn_id: u64, writes: &[(&[u8], &[u8])], ) -> Result<()>

Batch write multiple key-value pairs with reduced overhead

This API amortizes fixed costs over the batch:

  • Single DashMap entry lookup for TxnWalBuffer
  • Single MVCC write set update
  • Batch memtable operations

Performance: ~2-3x faster than individual write_refs calls for batches of 100+ entries.

§Arguments
  • txn_id - Transaction ID
  • writes - Slice of (key, value) pairs
Source

pub fn commit(&self, txn_id: u64) -> Result<u64>

Commit a transaction

With sync_mode:

  • 0 (OFF): No sync, risk of data loss
  • 1 (NORMAL): Adaptive sync using Little’s Law: W* = √(τ/λ)
  • 2 (FULL): Sync every commit (safest, slowest)
Source

pub fn set_sync_mode(&self, mode: u64)

Set synchronous mode

  • 0: OFF - No fsync (risk of data loss)
  • 1: NORMAL - Periodic fsync (balanced)
  • 2: FULL - Fsync every commit (safest)
Source

pub fn flush_group_commit(&self)

Force a group commit flush (useful for benchmarking or testing)

Source

pub fn abort(&self, txn_id: u64) -> Result<()>

Abort a transaction

Performance: O(1) for read-only transactions (no writes to clean up). For write transactions, O(N) memtable scan is required to remove uncommitted versions.

Source

pub fn scan( &self, txn_id: u64, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys with prefix

Source

pub fn scan_range( &self, txn_id: u64, start: &[u8], end: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>

Scan keys in range

Source

pub fn scan_range_iter<'a>( &'a self, txn_id: u64, start: &'a [u8], end: &'a [u8], ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a

Streaming scan for very large result sets

Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory.

Source

pub fn flush_wal(&self) -> Result<()>

Force fsync to disk Flush the WAL’s in-memory buffer to the OS

This ensures all buffered writes are pushed from the BufWriter into the OS page cache. Call this before fsync() to ensure all data is durable.

Source

pub fn fsync(&self) -> Result<()>

Force sync the WAL to disk (fsync)

Source

pub fn checkpoint(&self) -> Result<u64>

Write checkpoint

Source

pub fn truncate_wal(&self) -> Result<()>

Truncate the WAL file after checkpoint.

This physically truncates the WAL file to 0 bytes, reclaiming disk space. The in-memory memtable retains all data for the current session, but a crash after truncation will result in data loss since the WAL is the only persistence mechanism for DurableStorage.

Call after checkpoint() when WAL durability across restarts is not required (e.g. desktop telemetry viewers, caches).

Source

pub fn stats(&self) -> StorageStats

Get storage statistics

Source

pub fn gc(&self) -> usize

Garbage collect old versions

Source

pub fn shutdown(&self) -> Result<()>

Clean shutdown

Trait Implementations§

Source§

impl Drop for DurableStorage

Source§

fn drop(&mut self)

Executes the destructor for this type. Read more
Source§

fn pin_drop(self: Pin<&mut Self>)

🔬This is a nightly-only experimental API. (pin_ergonomics)
Execute the destructor for this type, but different to Drop::drop, it requires self to be pinned. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more