Expand description
§lsmdb
A persistent key-value storage engine built on a Log-Structured Merge Tree (LSM-Tree).
§Why LSM-Tree?
Traditional B-Tree engines perform random in-place writes, which are slow on both HDDs (seek latency) and SSDs (write amplification from page-level overwriting). LSM-Trees turn all writes into sequential appends — first to the WAL, then to the MemTable, then batched to disk as immutable SSTables — making writes as fast as the underlying storage’s sequential throughput.
§Design Decisions Worth Knowing
- WAL before MemTable: Every write hits the WAL first. If the process crashes between the WAL append and the MemTable insert, the WAL replay at startup recovers the write. Without this ordering, acknowledged writes could vanish on crash.
- Immutable SSTables: Once a MemTable is flushed, its SSTable is never mutated. This means reads are always consistent and compaction can merge files without coordination with readers.
- Multi-level compaction: L0 files can overlap in key range (they are flushed sequentially). Higher levels are sorted and non-overlapping. Compacting L0→L1 and cascading upward keeps read amplification bounded — otherwise a read could scan every L0 file on a miss.
- Bloom Filters: A point query that misses everything in memory would otherwise read every SSTable. Bloom Filters give a definitive “not present” answer in O(k) hash operations with ~1% false positive rate, eliminating almost all unnecessary disk seeks.
- LRU Block Cache: SSTables are divided into 4 KB Data Blocks. Hot blocks (recent or repeated reads) are kept in an LRU cache so repeated reads don’t pay the mmap page fault cost.
Author: Nrishinghananda Roy
Modules§
- bloom_
filter 🔒 - constants
- Tuning knobs for the LSM-Tree engine.
- memtable 🔒
- sstable 🔒
- wal 🔒
Structs§
- Storage
Engine - The central coordinator of the LSM-Tree storage engine.