pub struct MvccMemTable { /* private fields */ }Expand description
MemTable with MVCC support
Uses DashMap for lock-free concurrent access per key. This eliminates the global write lock bottleneck.
Uses epoch-based dirty tracking for O(expired) GC instead of O(n) full scan. Maintains a deferred sorted index for efficient scans:
- Writes: O(1) append to hot buffer
- Scans: O(N log N) sort-on-demand (amortized across many writes)
Implementations§
Source§impl MvccMemTable
impl MvccMemTable
pub fn new() -> Self
Sourcepub fn with_ordered_index(enable_ordered_index: bool) -> Self
pub fn with_ordered_index(enable_ordered_index: bool) -> Self
Create memtable with optional ordered index
When enable_ordered_index is false, saves ~134 ns/op on writes
but scan_prefix becomes O(N) instead of O(log N + K)
Uses deferred sorting by default for better write performance:
- Writes: O(1) append to hot buffer
- Scans: O(N log N) sort-on-demand
Sourcepub fn with_index_mode(enable_ordered_index: bool, use_deferred: bool) -> Self
pub fn with_index_mode(enable_ordered_index: bool, use_deferred: bool) -> Self
Create memtable with fine-grained control over indexing
§Arguments
enable_ordered_index- Whether to maintain an ordered indexuse_deferred- If true, use deferred sorting (O(1) writes, sort-on-scan) If false, use SkipMap (O(log N) writes)
Sourcepub fn write(
&self,
key: Vec<u8>,
value: Option<Vec<u8>>,
txn_id: u64,
) -> Result<()>
pub fn write( &self, key: Vec<u8>, value: Option<Vec<u8>>, txn_id: u64, ) -> Result<()>
Write a key-value pair (uncommitted)
Sourcepub fn write_batch(
&self,
writes: &[(Vec<u8>, Option<Vec<u8>>)],
txn_id: u64,
) -> Result<()>
pub fn write_batch( &self, writes: &[(Vec<u8>, Option<Vec<u8>>)], txn_id: u64, ) -> Result<()>
Write multiple key-value pairs (uncommitted) - more efficient than individual writes
Optimizations applied (Rec 3: MVCC Batching):
- Batched dirty list tracking: single lock acquire for all keys
- Deferred index: O(1) append per key
Sourcepub fn read(
&self,
key: &[u8],
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> Option<Vec<u8>>
pub fn read( &self, key: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Option<Vec<u8>>
Read at snapshot timestamp, with optional current txn to see own writes
Sourcepub fn commit(
&self,
txn_id: u64,
commit_ts: u64,
write_set: &HashSet<InlineKey>,
)
pub fn commit( &self, txn_id: u64, commit_ts: u64, write_set: &HashSet<InlineKey>, )
Commit all versions for a transaction
Only updates the keys that were written by this transaction (tracked in write_set). Accepts InlineKey for zero-allocation MVCC tracking.
Sourcepub fn commit_all(&self, txn_id: u64, commit_ts: u64)
pub fn commit_all(&self, txn_id: u64, commit_ts: u64)
Legacy commit method (iterates all keys) - kept for backward compatibility
Sourcepub fn scan_prefix(
&self,
prefix: &[u8],
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> Vec<(Vec<u8>, Vec<u8>)>
pub fn scan_prefix( &self, prefix: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>
Scan keys with prefix at snapshot (without seeing uncommitted from other txns)
§Performance
When ordered_index is enabled: O(log N + K) complexity
- O(log N) to seek to the first key with prefix
- O(K) to iterate matching keys
When ordered_index is disabled: O(N) full DashMap scan (fallback)
§Optimizations Applied
- Pre-allocates result vector based on expected output size
- Uses batch-friendly iteration patterns
- Minimizes allocations during iteration
- Deferred index: compacts hot buffer on first scan for sorted iteration
Sourcepub fn scan_all(
&self,
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> Vec<(Vec<u8>, Vec<u8>)>
pub fn scan_all( &self, snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>
Optimized full scan with batch allocation
For use when scanning entire tables/namespaces. Pre-allocates based on actual data size.
Sourcepub fn scan_prefix_iter<'a>(
&'a self,
prefix: &'a [u8],
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
pub fn scan_prefix_iter<'a>( &'a self, prefix: &'a [u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
Streaming scan iterator for very large datasets
Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory.
Sourcepub fn scan_range(
&self,
start: &[u8],
end: &[u8],
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> Vec<(Vec<u8>, Vec<u8>)>
pub fn scan_range( &self, start: &[u8], end: &[u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> Vec<(Vec<u8>, Vec<u8>)>
Scan range
Sourcepub fn scan_range_iter<'a>(
&'a self,
start: &'a [u8],
end: &'a [u8],
snapshot_ts: u64,
current_txn_id: Option<u64>,
) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
pub fn scan_range_iter<'a>( &'a self, start: &'a [u8], end: &'a [u8], snapshot_ts: u64, current_txn_id: Option<u64>, ) -> impl Iterator<Item = (Vec<u8>, Vec<u8>)> + 'a
Streaming range scan iterator for very large datasets
Returns an iterator that yields (key, value) pairs without materializing the entire result set in memory. Uses the ordered index when available for O(log N + K) complexity.
§Zero-Allocation Design
While the iterator itself cannot avoid allocations for returned values (since the caller needs ownership), it avoids:
- Pre-materializing all results
- Intermediate buffers
- Repeated key comparisons for already-visited entries
§Usage
for (key, value) in memtable.scan_range_iter(b"start", b"end", ts, txn) {
// Process each result as it arrives
// Memory usage is O(1) per iteration, not O(N) total
}Sourcepub fn gc(&self, min_active_ts: u64) -> usize
pub fn gc(&self, min_active_ts: u64) -> usize
Garbage collect old versions using epoch-based dirty list
O(expired_versions) instead of O(all_versions) Only visits keys that had versions created in the old epoch.
Sourcepub fn gc_full_scan(&self, min_active_ts: u64) -> usize
pub fn gc_full_scan(&self, min_active_ts: u64) -> usize
Legacy full-scan GC (for testing or when epoch-based tracking isn’t available)
Trait Implementations§
Source§impl Default for MvccMemTable
impl Default for MvccMemTable
Source§impl MvccStore for MvccMemTable
impl MvccStore for MvccMemTable
Source§fn mvcc_get(
&self,
key: &[u8],
snapshot_ts: u64,
txn_id: Option<u64>,
) -> Option<Vec<u8>>
fn mvcc_get( &self, key: &[u8], snapshot_ts: u64, txn_id: Option<u64>, ) -> Option<Vec<u8>>
snapshot_ts, optionally seeing own writes
from txn_id.Source§fn mvcc_put(
&self,
key: &[u8],
value: Option<Vec<u8>>,
txn_id: u64,
) -> Result<(), MvccStoreError>
fn mvcc_put( &self, key: &[u8], value: Option<Vec<u8>>, txn_id: u64, ) -> Result<(), MvccStoreError>
None) as uncommitted for the given transaction.Source§fn mvcc_commit_key(&self, key: &[u8], txn_id: u64, commit_ts: u64) -> bool
fn mvcc_commit_key(&self, key: &[u8], txn_id: u64, commit_ts: u64) -> bool
true if found and committed.Source§fn mvcc_abort_key(&self, key: &[u8], txn_id: u64)
fn mvcc_abort_key(&self, key: &[u8], txn_id: u64)
Source§fn mvcc_has_conflict(&self, key: &[u8], txn_id: u64) -> bool
fn mvcc_has_conflict(&self, key: &[u8], txn_id: u64) -> bool
Source§fn mvcc_gc(&self, min_ts: u64) -> MvccGcStats
fn mvcc_gc(&self, min_ts: u64) -> MvccGcStats
Source§fn mvcc_key_count(&self) -> usize
fn mvcc_key_count(&self) -> usize
Auto Trait Implementations§
impl !Freeze for MvccMemTable
impl !RefUnwindSafe for MvccMemTable
impl Send for MvccMemTable
impl Sync for MvccMemTable
impl Unpin for MvccMemTable
impl UnsafeUnpin for MvccMemTable
impl !UnwindSafe for MvccMemTable
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more