- Memtable — in-memory sorted write buffer; flushes to an immutable sorted run when full
- Multiple sorted runs — each flush appends a run; reads merge across all of them, newest first
- Background compaction — a dedicated thread merges runs to bound read amplification, concurrent with reads and writes
- Frozen on-disk format — block-structured runs with per-block CRC32C integrity; specified in
docs/SSTABLE_FORMAT.md - Crash recovery — a manifest records the live runs; a crash mid-flush or mid-compaction recovers to a consistent state
- Tombstone deletes — deletes mask older values and resolve away during compaction
- Range scans — merge the buffer and every run into one sorted stream
- Grouped writes — apply a batch atomically with respect to concurrent readers
- Crash-safe writes — under the
durabilityfeature, every write hits awal-dblog before acknowledgment and is replayed on open (no acknowledged write lost across a crash) - Bloom-filtered reads — under the
bloomfeature, a per-run filter lets a point read skip any run that can't contain the key (negative lookups read no data blocks) - Block cache — a shared cache of decoded run blocks; a repeat point read over a hot working set does no I/O, checksum, or parse
- Shared, thread-safe handle — one engine, many threads, behind an
Arc
Installation
[]
= "1.0"
# Crash-safe writes (write-ahead log) and/or bloom-filtered point reads:
= { = "1.0", = ["durability", "bloom"] }
Quick Start
use Lsm;
Tuning lives behind LsmConfig; grouped writes behind Batch. See docs/API.md for the full reference and the examples/ directory for runnable programs.
Status
This is v1.0.0 — the first stable release. The public API is frozen until 2.0 and the on-disk format is frozen for the 1.x series. The engine is feature-complete, hardened against hostile input, and soak-tested single- and multi-threaded: multiple on-disk runs, background compaction, crash recovery, crash-safe writes (durability), bloom-filtered point reads (bloom), and a block cache, behind the Tier-1 API (open/put/get/delete/scan). See docs/API.md and docs/PERFORMANCE.md.
Where It Fits
lsm-db is a storage engine. It builds on:
wal-db— memtable durability and crash recoverybloom-lib— SSTable point-read filtering- Hive DB — a candidate storage engine behind the
StorageEnginetrait
It stays foreign-compatible: usable standalone as an embedded key-value store.
Cross-Platform Support
Tier 1 Support:
- Linux (x86_64, aarch64)
- macOS (x86_64, Apple Silicon)
- Windows (x86_64)
Behavior is verified on each target by the CI matrix.
Contributing
Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.