pub struct DurableStorage { /* private fields */ }Expand description
Durable storage engine with full ACID support
Implementations§
Source§impl DurableStorage
impl DurableStorage
Sourcepub fn open_with_config<P: AsRef<Path>>(
path: P,
enable_ordered_index: bool,
) -> Result<Self>
pub fn open_with_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, ) -> Result<Self>
Open with configurable ordered index
When enable_ordered_index is false, saves ~134 ns/op on writes
but scan_prefix becomes O(N) instead of O(log N + K)
Sourcepub fn open_with_arena<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn open_with_arena<P: AsRef<Path>>(path: P) -> Result<Self>
Open with arena-backed memtable for write-heavy workloads
Uses ArenaMvccMemTable which reduces per-write allocations from 3 to 1. Best for workloads with:
- High write throughput
- Large keys (reduces allocation overhead)
- Minimal concurrent reads during writes
Sourcepub fn open_with_full_config<P: AsRef<Path>>(
path: P,
enable_ordered_index: bool,
memtable_type: MemTableType,
) -> Result<Self>
pub fn open_with_full_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, memtable_type: MemTableType, ) -> Result<Self>
Open with full configuration options
§Arguments
path- Storage directory pathenable_ordered_index- Enable ordered index for O(log N) scansmemtable_type- Type of memtable to use (Standard or Arena)
§Locking
Acquires an exclusive advisory lock on the database directory.
This prevents concurrent multi-process access which would corrupt data.
If another process has the database open, returns Err(DatabaseLocked).
Sourcepub fn open_ephemeral() -> Result<EphemeralHandle>
pub fn open_ephemeral() -> Result<EphemeralHandle>
Open an ephemeral (in-memory-like) DurableStorage backed by a temp directory.
Uses the full DurableStorage engine (WAL, MVCC, SSI) but writes to a
temporary directory that is automatically cleaned up when the
EphemeralHandle is dropped. This ensures test and production code paths
are identical — bugs found in tests are guaranteed to reproduce in production.
§Returns
An EphemeralHandle that owns both the storage and the temp directory.
Access the storage via handle.storage() or Deref coercion.
§Example
let handle = DurableStorage::open_ephemeral()?;
let txn = handle.begin_transaction()?;
handle.write(txn, b"key".to_vec(), b"value".to_vec())?;
handle.commit(txn)?;
// temp directory cleaned up when `handle` dropsSourcepub fn open_ephemeral_with_group_commit() -> Result<EphemeralHandle>
pub fn open_ephemeral_with_group_commit() -> Result<EphemeralHandle>
Open an ephemeral DurableStorage with group commit enabled.
Same as open_ephemeral() but with group commit for higher throughput.
Sourcepub fn open_with_group_commit<P: AsRef<Path>>(path: P) -> Result<Self>
pub fn open_with_group_commit<P: AsRef<Path>>(path: P) -> Result<Self>
Open with group commit enabled
Sourcepub fn open_with_group_commit_and_config<P: AsRef<Path>>(
path: P,
enable_ordered_index: bool,
) -> Result<Self>
pub fn open_with_group_commit_and_config<P: AsRef<Path>>( path: P, enable_ordered_index: bool, ) -> Result<Self>
Open with group commit and configurable ordered index
Sourcepub fn open_with_policy<P: AsRef<Path>>(
path: P,
policy: IndexPolicy,
group_commit: bool,
) -> Result<Self>
pub fn open_with_policy<P: AsRef<Path>>( path: P, policy: IndexPolicy, group_commit: bool, ) -> Result<Self>
Open with IndexPolicy for automatic memtable/index configuration
This is the recommended constructor for new code. The policy determines:
- Whether to use ordered index (ScanOptimized only)
- Whether to use arena-backed memtable (WriteOptimized, AppendOnly)
- Default settings optimized for the workload pattern
§Arguments
path- Storage directory pathpolicy- Index policy determining write/scan tradeoffsgroup_commit- Whether to enable group commit for throughput
Sourcepub fn open_for_concurrent<P: AsRef<Path>>(
path: P,
policy: IndexPolicy,
) -> Result<Self>
pub fn open_for_concurrent<P: AsRef<Path>>( path: P, policy: IndexPolicy, ) -> Result<Self>
Open storage for concurrent mode (multi-reader, single-writer)
This method opens the storage WITHOUT acquiring the exclusive file lock. Coordination is handled by the concurrent MVCC layer instead.
§Safety
This must ONLY be called from Database::open_concurrent() which
manages the concurrent MVCC coordination. Direct use will cause
data corruption.
Sourcepub fn memtable_type(&self) -> MemTableType
pub fn memtable_type(&self) -> MemTableType
Get the memtable type being used
Sourcepub fn recover(&self) -> Result<RecoveryStats>
pub fn recover(&self) -> Result<RecoveryStats>
Perform crash recovery
Sourcepub fn begin_transaction(&self) -> Result<u64>
pub fn begin_transaction(&self) -> Result<u64>
Begin a new transaction
Sourcepub fn begin_with_mode(&self, mode: TransactionMode) -> Result<u64>
pub fn begin_with_mode(&self, mode: TransactionMode) -> Result<u64>
Begin a transaction with a specific mode (ReadOnly/WriteOnly/ReadWrite)
This enables mode-aware optimizations:
- ReadOnly: Skip SSI tracking, 2.6x faster reads
- WriteOnly: Skip read tracking, faster bulk inserts
- ReadWrite: Full SSI for serializable isolation
Sourcepub fn begin_read_only_fast(&self) -> u64
pub fn begin_read_only_fast(&self) -> u64
Begin a read-only transaction without any WAL records.
This is a performance-critical optimization that eliminates two WAL mutex acquisitions per read (TxnBegin + TxnAbort). Since read-only transactions have no state to recover, WAL records are unnecessary.
Callers MUST use abort_read_only_fast() to clean up.
Sourcepub fn abort_read_only_fast(&self, txn_id: u64)
pub fn abort_read_only_fast(&self, txn_id: u64)
Abort a fast read-only transaction.
O(1) cleanup: only removes MVCC state. No WAL write, no memtable scan.
Sourcepub fn read_latest(&self, key: &[u8]) -> Option<Vec<u8>>
pub fn read_latest(&self, key: &[u8]) -> Option<Vec<u8>>
Read a key WITHOUT any MVCC transaction tracking.
Uses the current global timestamp to see all committed writes. Bypasses: begin/abort, active_txns DashMap, record_read, stats. Only safe for single-threaded access (no concurrent writes).
Sourcepub fn scan_latest(&self, prefix: &[u8]) -> Vec<(Vec<u8>, Vec<u8>)>
pub fn scan_latest(&self, prefix: &[u8]) -> Vec<(Vec<u8>, Vec<u8>)>
Scan keys with a prefix WITHOUT any MVCC transaction tracking.
Uses the current global timestamp. Only safe for single-threaded access.
Sourcepub fn read(&self, txn_id: u64, key: &[u8]) -> Result<Option<Vec<u8>>>
pub fn read(&self, txn_id: u64, key: &[u8]) -> Result<Option<Vec<u8>>>
Read a key within a transaction
Sourcepub fn write(&self, txn_id: u64, key: Vec<u8>, value: Vec<u8>) -> Result<()>
pub fn write(&self, txn_id: u64, key: Vec<u8>, value: Vec<u8>) -> Result<()>
Write a key-value pair within a transaction
Writes are buffered and only flushed to disk on commit. This provides ~10× better throughput for batched inserts.
Sourcepub fn write_refs(&self, txn_id: u64, key: &[u8], value: &[u8]) -> Result<()>
pub fn write_refs(&self, txn_id: u64, key: &[u8], value: &[u8]) -> Result<()>
Write from references - zero allocation hot path
Avoids cloning key/value by writing to WAL from refs directly, then only allocating once for memtable storage.
Sourcepub fn delete(&self, txn_id: u64, key: Vec<u8>) -> Result<()>
pub fn delete(&self, txn_id: u64, key: Vec<u8>) -> Result<()>
Delete a key within a transaction
Sourcepub fn write_batch_refs(
&self,
txn_id: u64,
writes: &[(&[u8], &[u8])],
) -> Result<()>
pub fn write_batch_refs( &self, txn_id: u64, writes: &[(&[u8], &[u8])], ) -> Result<()>
Batch write multiple key-value pairs with reduced overhead
This API amortizes fixed costs over the batch:
- Single DashMap entry lookup for TxnWalBuffer
- Single MVCC write set update
- Batch memtable operations
Performance: ~2-3x faster than individual write_refs calls for batches of 100+ entries.
§Arguments
txn_id- Transaction IDwrites- Slice of (key, value) pairs
Sourcepub fn commit(&self, txn_id: u64) -> Result<u64>
pub fn commit(&self, txn_id: u64) -> Result<u64>
Commit a transaction
With sync_mode:
- 0 (OFF): No sync, risk of data loss
- 1 (NORMAL): Adaptive sync using Little’s Law: W* = √(τ/λ)
- 2 (FULL): Sync every commit (safest, slowest)
Sourcepub fn set_sync_mode(&self, mode: u64)
pub fn set_sync_mode(&self, mode: u64)
Set synchronous mode
- 0: OFF - No fsync (risk of data loss)
- 1: NORMAL - Periodic fsync (balanced)
- 2: FULL - Fsync every commit (safest)
Sourcepub fn flush_group_commit(&self)
pub fn flush_group_commit(&self)
Force a group commit flush (useful for benchmarking or testing)
Sourcepub fn abort(&self, txn_id: u64) -> Result<()>
pub fn abort(&self, txn_id: u64) -> Result<()>
Abort a transaction
Performance: O(1) for read-only transactions (no writes to clean up). For write transactions, O(N) memtable scan is required to remove uncommitted versions.
Sourcepub fn scan(
&self,
txn_id: u64,
prefix: &[u8],
) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
pub fn scan( &self, txn_id: u64, prefix: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
Scan keys with prefix
Sourcepub fn scan_range(
&self,
txn_id: u64,
start: &[u8],
end: &[u8],
) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
pub fn scan_range( &self, txn_id: u64, start: &[u8], end: &[u8], ) -> Result<Vec<(Vec<u8>, Vec<u8>)>>
Scan keys in range
Sourcepub fn scan_range_iter<'a>(
&'a self,
txn_id: u64,
start: &'a [u8],
end: &'a [u8],
) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
pub fn scan_range_iter<'a>( &'a self, txn_id: u64, start: &'a [u8], end: &'a [u8], ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
Streaming scan for very large result sets
Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory.
Sourcepub fn flush_wal(&self) -> Result<()>
pub fn flush_wal(&self) -> Result<()>
Force fsync to disk Flush the WAL’s in-memory buffer to the OS
This ensures all buffered writes are pushed from the BufWriter
into the OS page cache. Call this before fsync() to ensure
all data is durable.
Sourcepub fn checkpoint(&self) -> Result<u64>
pub fn checkpoint(&self) -> Result<u64>
Write checkpoint
Sourcepub fn truncate_wal(&self) -> Result<()>
pub fn truncate_wal(&self) -> Result<()>
Truncate the WAL file after checkpoint.
This physically truncates the WAL file to 0 bytes, reclaiming disk space. The in-memory memtable retains all data for the current session, but a crash after truncation will result in data loss since the WAL is the only persistence mechanism for DurableStorage.
Call after checkpoint() when WAL durability across restarts is
not required (e.g. desktop telemetry viewers, caches).
Sourcepub fn stats(&self) -> StorageStats
pub fn stats(&self) -> StorageStats
Get storage statistics
Trait Implementations§
Source§impl Drop for DurableStorage
impl Drop for DurableStorage
Auto Trait Implementations§
impl !Freeze for DurableStorage
impl !RefUnwindSafe for DurableStorage
impl !UnwindSafe for DurableStorage
impl Send for DurableStorage
impl Sync for DurableStorage
impl Unpin for DurableStorage
impl UnsafeUnpin for DurableStorage
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more