Skip to main content

MvccMemTable

Struct MvccMemTable 

Source
pub struct MvccMemTable { /* private fields */ }
Expand description

MemTable with MVCC support

Uses DashMap for lock-free concurrent access per key. This eliminates the global write lock bottleneck.

Uses epoch-based dirty tracking for O(expired) GC instead of O(n) full scan. Maintains a deferred sorted index for efficient scans:

  • Writes: O(1) append to hot buffer
  • Scans: O(N log N) sort-on-demand (amortized across many writes)

Implementations§

Source§

impl MvccMemTable

Source

pub fn new() -> Self

Source

pub fn with_ordered_index(enable_ordered_index: bool) -> Self

Create memtable with optional ordered index

When enable_ordered_index is false, saves ~134 ns/op on writes but scan_prefix becomes O(N) instead of O(log N + K)

Uses deferred sorting by default for better write performance:

  • Writes: O(1) append to hot buffer
  • Scans: O(N log N) sort-on-demand
Source

pub fn with_index_mode(enable_ordered_index: bool, use_deferred: bool) -> Self

Create memtable with fine-grained control over indexing

§Arguments
  • enable_ordered_index - Whether to maintain an ordered index
  • use_deferred - If true, use deferred sorting (O(1) writes, sort-on-scan) If false, use SkipMap (O(log N) writes)
Source

pub fn write( &self, key: Vec<u8>, value: Option<Vec<u8>>, txn_id: u64, ) -> Result<()>

Write a key-value pair (uncommitted)

Source

pub fn write_batch( &self, writes: &[(Vec<u8>, Option<Vec<u8>>)], txn_id: u64, ) -> Result<()>

Write multiple key-value pairs (uncommitted) - more efficient than individual writes

Optimizations applied (Rec 3: MVCC Batching):

  • Batched dirty list tracking: single lock acquire for all keys
  • Deferred index: O(1) append per key
Source

pub fn read( &self, key: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Option<Vec<u8>>

Read at snapshot timestamp, with optional current txn to see own writes

Source

pub fn commit( &self, txn_id: u64, commit_ts: u64, write_set: &HashSet<InlineKey>, )

Commit all versions for a transaction

Only updates the keys that were written by this transaction (tracked in write_set). Accepts InlineKey for zero-allocation MVCC tracking.

Source

pub fn commit_all(&self, txn_id: u64, commit_ts: u64)

Legacy commit method (iterates all keys) - kept for backward compatibility

Source

pub fn abort(&self, txn_id: u64)

Abort all versions for a transaction

Source

pub fn scan_prefix( &self, prefix: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>

Scan keys with prefix at snapshot (without seeing uncommitted from other txns)

§Performance

When ordered_index is enabled: O(log N + K) complexity

  • O(log N) to seek to the first key with prefix
  • O(K) to iterate matching keys

When ordered_index is disabled: O(N) full DashMap scan (fallback)

§Optimizations Applied
  • Pre-allocates result vector based on expected output size
  • Uses batch-friendly iteration patterns
  • Minimizes allocations during iteration
  • Deferred index: compacts hot buffer on first scan for sorted iteration
Source

pub fn scan_all( &self, snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>

Optimized full scan with batch allocation

For use when scanning entire tables/namespaces. Pre-allocates based on actual data size.

Source

pub fn scan_prefix_iter<'a>( &'a self, prefix: &'a [u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a

Streaming scan iterator for very large datasets

Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory.

Source

pub fn scan_range( &self, start: &[u8], end: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>

Scan range

Source

pub fn scan_range_iter<'a>( &'a self, start: &'a [u8], end: &'a [u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a

Streaming range scan iterator for very large datasets

Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory. Uses the ordered index when available for O(log N + K) complexity.

§Zero-Allocation Design

While the iterator itself cannot avoid allocations for returned values (since the caller needs ownership), it avoids:

  • Pre-materializing all results
  • Intermediate buffers
  • Repeated key comparisons for already-visited entries
§Usage
for (key, value) in memtable.scan_range_iter(b"start", b"end", ts, txn) {
    // Process each result as it arrives
    // Memory usage is O(1) per iteration, not O(N) total
}
Source

pub fn size(&self) -> u64

Get approximate size

Source

pub fn gc(&self, min_active_ts: u64) -> usize

Garbage collect old versions using epoch-based dirty list

O(expired_versions) instead of O(all_versions) Only visits keys that had versions created in the old epoch.

Source

pub fn gc_full_scan(&self, min_active_ts: u64) -> usize

Legacy full-scan GC (for testing or when epoch-based tracking isn’t available)

Trait Implementations§

Source§

impl Default for MvccMemTable

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl MvccStore for MvccMemTable

Source§

fn mvcc_get( &self, key: &[u8], snapshot_ts: u64, txn_id: Option<u64>, ) -> Option<Vec<u8>>

Read the visible value at snapshot_ts, optionally seeing own writes from txn_id.
Source§

fn mvcc_put( &self, key: &[u8], value: Option<Vec<u8>>, txn_id: u64, ) -> Result<(), MvccStoreError>

Write a value (or tombstone None) as uncommitted for the given transaction.
Source§

fn mvcc_commit_key(&self, key: &[u8], txn_id: u64, commit_ts: u64) -> bool

Commit one key’s uncommitted write. Returns true if found and committed.
Source§

fn mvcc_abort_key(&self, key: &[u8], txn_id: u64)

Abort one key’s uncommitted write.
Source§

fn mvcc_has_conflict(&self, key: &[u8], txn_id: u64) -> bool

Check if there’s an uncommitted write conflict on a key.
Source§

fn mvcc_gc(&self, min_ts: u64) -> MvccGcStats

Run garbage collection. Returns statistics.
Source§

fn mvcc_key_count(&self) -> usize

Number of distinct keys in the store.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
where ST: ?Sized, DT: ?Sized,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Read<Exclusive, BecauseExclusive> for T
where T: ?Sized,

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more