mmdb-2.0.0 has been yanked.
MMDB — A Modern LSM-Tree Storage Engine
A pure-Rust, synchronous LSM-Tree key-value storage engine. Designed as a native Rust alternative to RocksDB with competitive performance — scan throughput matches or exceeds RocksDB on equivalent configurations.
Performance
Benchmarked via vsdb (500K entries, 128-byte values, 64MB write buffer, 256MB block cache):
| Operation | RocksDB | MMDB | Result |
|---|---|---|---|
| scan 100 entries | 14.1 us | 11.7 us | MMDB 17% faster |
| scan 1000 entries | 112 us | 108 us | MMDB 4% faster |
| scan 10000 entries | 1.10 ms | 1.09 ms | MMDB 1% faster |
| full scan 500K | 54.5 ms | 52.4 ms | MMDB 4% faster |
| point read | 405 ns | 168 ns | MMDB 2.4x faster |
Features
| Feature | Status |
|---|---|
| Put / Get / Delete | Implemented |
| WriteBatch (atomic multi-key writes) | Implemented |
| WAL with group commit & crash recovery | Implemented |
| SST files with prefix-compressed blocks | Implemented |
| Bloom filters (per-key + prefix bloom) | Implemented |
| Block cache (moka LRU) with L0 pinning | Implemented |
| Table cache (open file handles) | Implemented |
| MANIFEST version tracking + compaction | Implemented |
| Leveled compaction with trivial move | Implemented |
| Multi-threaded background compaction | Implemented |
| Streaming compaction (O(block) memory) | Implemented |
| MVCC snapshots via sequence numbers | Implemented |
| Forward/backward/prefix/range iterators | Implemented |
| Compression: None, LZ4, Zstd (per-level) | Implemented |
| Write backpressure (slowdown/stop) | Implemented |
| DeleteRange (range tombstones) | Implemented |
| CompactRange API | Implemented |
| Compaction filter | Implemented |
| Rate limiter (token bucket) | Implemented |
| DB properties/statistics | Implemented |
| Configurable options (RocksDB parity) | Implemented |
Architecture
+-------------+
| DB (API) | get / put / delete / write_batch
+------+------+
|
+------------+------------+
v v v
+----------+ +---------+ +-----------+
| MemTable | | WAL | | MANIFEST |
| (active) | | (append)| | (versions)|
+----+-----+ +---------+ +-----------+
| freeze
v
+----------+
| MemTable |
| (immut.) |
+----+-----+
| flush
v
+------------------------------------+
| SST Layer (L0 .. Ln) |
| L0: overlapping files |
| L1+: sorted, non-overlapping |
+------------------------------------+
^
| background compaction (N threads)
+----------+-----------+
| Leveled Compaction |
+----------------------+
Write Path
- Encode as InternalKey (
user_key + !pack(sequence, type)— inverted big-endian for descending sequence order) - Append to WAL (group commit: leader batches multiple writers into one fsync)
- Insert into active MemTable (lock-free crossbeam-skiplist)
- When MemTable exceeds
write_buffer_size, freeze and flush to L0 SST - Signal background compaction threads if L0 file count exceeds threshold
Read Path
- Active MemTable (newest writes)
- Immutable MemTables (newest first)
- L0 SST files (newest first, per-file bloom filter + range check)
- L1+ SST files (LevelIterator with lazy file opening, binary search by key range)
Iterator Architecture (key performance innovation)
The scan path uses a MergingIterator (min-heap) that merges entries from:
- MemTable cursors (O(1) per entry via skiplist level-0 chain)
- L0 TableIterators (one per overlapping SST file)
- LevelIterators (one per L1+ level — lazy two-level iterator)
Key optimizations that match/exceed RocksDB:
- Zero-copy block cache:
BlockstoresArc<Vec<u8>>— cache hits avoid memcpy - Cursor-based block iteration: Decodes entries one-at-a-time from raw block data (
decode_entry_reuse) — no per-block Vec allocation - Deferred block read: SST index stores
first_keyper block; Seek positions without reading data blocks - Buffer reuse:
IterSourcereuses key/value buffers;next_into()decodes directly into caller buffers - Sequential readahead:
posix_fadvise(WILLNEED)after detecting sequential block access - L0 metadata pinning: Index entries and first data blocks pre-cached for L0 files
- Sweep-line range tombstone tracking: O(1) amortized instead of O(T) per key
- Lazy index loading:
TableIterator::new()is O(1) — index parsed on first use
SST Format
Data Block 0: [entries with prefix compression + restart points]
Data Block 1: ...
...
Filter Block: [bloom filter bits]
Prefix Filter Block: [prefix bloom bits]
Metaindex Block: ["filter.bloom" -> handle, "filter.prefix" -> handle]
Index Block: [last_key -> BlockHandle + first_key per block]
Footer (48 bytes): [metaindex_handle, index_handle, magic]
Source Code Structure
src/
+-- lib.rs # Public API re-exports
+-- db.rs # DB: open/get/put/delete/write/flush/compact/close
+-- options.rs # DbOptions, ReadOptions, WriteOptions (RocksDB-compatible)
+-- types.rs # InternalKey, ValueType, SequenceNumber, WriteBatch
+-- error.rs # Error types
+-- rate_limiter.rs # Token-bucket rate limiter
+-- stats.rs # Database statistics
+-- memtable/
| +-- mod.rs # MemTable (put/get/iter with approximate_size tracking)
| +-- skiplist.rs # OrdInternalKey + crossbeam-skiplist (O(1) iteration)
+-- wal/
| +-- writer.rs # WAL append with block-based fragmentation
| +-- reader.rs # WAL replay for crash recovery
| +-- record.rs # Record format: checksum + length + type + payload
+-- sst/
| +-- block.rs # Data block: prefix compression + restart points + seek
| +-- block_builder.rs # Block construction
| +-- table_builder.rs # SST writer (data + filter + index with first_key + footer)
| +-- table_reader.rs # SST reader + TableIterator (cursor-based, deferred block read)
| +-- filter.rs # Bloom filter (double hashing)
| +-- format.rs # Footer, BlockHandle, CompressionType, IndexEntry encoding
+-- compaction/
| +-- leveled.rs # Leveled compaction: streaming merge, trivial move, filter
+-- manifest/
| +-- version_edit.rs # VersionEdit encode/decode
| +-- version.rs # Version (immutable SST file set snapshot)
| +-- version_set.rs # VersionSet (MANIFEST management + version chain)
+-- cache/
| +-- block_cache.rs # Block LRU cache (moka) with L0 pinning support
| +-- table_cache.rs # SST reader cache
+-- iterator/
+-- merge.rs # Min-heap MergingIterator with buffer reuse + prefetch
+-- db_iter.rs # DBIterator: dedup, snapshot, tombstone/range-del filtering
+-- level_iter.rs # LevelIterator: lazy two-level iterator for L1+
+-- range_del.rs # RangeTombstoneTracker: sweep-line O(1) amortized
+-- bidi_iter.rs # BidiIterator: bidirectional iteration
Public API
Configuration
let opts = DbOptions ;
// Or use a preset profile:
let opts = write_heavy; // 128MB memtable, 4 compaction threads, LZ4
let opts = read_heavy; // large cache, 14 bits/key bloom
Build & Test
License
MIT