lsm-db 1.0.0

Log-structured merge-tree storage engine for Rust. Memtable, leveled SSTables, background compaction, and bloom-filtered point reads over a durable wal-db log. A composable storage engine for embedded databases and Hive DB.
Documentation
  • Memtable — in-memory sorted write buffer; flushes to an immutable sorted run when full
  • Multiple sorted runs — each flush appends a run; reads merge across all of them, newest first
  • Background compaction — a dedicated thread merges runs to bound read amplification, concurrent with reads and writes
  • Frozen on-disk format — block-structured runs with per-block CRC32C integrity; specified in docs/SSTABLE_FORMAT.md
  • Crash recovery — a manifest records the live runs; a crash mid-flush or mid-compaction recovers to a consistent state
  • Tombstone deletes — deletes mask older values and resolve away during compaction
  • Range scans — merge the buffer and every run into one sorted stream
  • Grouped writes — apply a batch atomically with respect to concurrent readers
  • Crash-safe writes — under the durability feature, every write hits a wal-db log before acknowledgment and is replayed on open (no acknowledged write lost across a crash)
  • Bloom-filtered reads — under the bloom feature, a per-run filter lets a point read skip any run that can't contain the key (negative lookups read no data blocks)
  • Block cache — a shared cache of decoded run blocks; a repeat point read over a hot working set does no I/O, checksum, or parse
  • Shared, thread-safe handle — one engine, many threads, behind an Arc

Installation

[dependencies]
lsm-db = "1.0"

# Crash-safe writes (write-ahead log) and/or bloom-filtered point reads:
lsm-db = { version = "1.0", features = ["durability", "bloom"] }

Quick Start

use lsm_db::Lsm;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Open (or create) a database backed by a directory.
    let db = Lsm::open("my-db")?;

    // Keys and values are arbitrary bytes.
    db.put(b"user:1", b"alice")?;
    db.put(b"user:2", b"bob")?;

    // Point reads return owned values.
    assert_eq!(db.get(b"user:1")?, Some(b"alice".to_vec()));

    // Deletes mask the key.
    db.delete(b"user:1")?;
    assert_eq!(db.get(b"user:1")?, None);

    // Range scans walk keys in sorted order.
    db.put(b"user:1", b"alice")?;
    for (key, value) in db.scan(b"user:".to_vec()..b"user;".to_vec())? {
        println!("{} = {}", String::from_utf8_lossy(&key), String::from_utf8_lossy(&value));
    }

    // Force the buffer to disk; it will be there on the next open.
    db.flush()?;
    Ok(())
}

Tuning lives behind LsmConfig; grouped writes behind Batch. See docs/API.md for the full reference and the examples/ directory for runnable programs.

Status

This is v1.0.0 — the first stable release. The public API is frozen until 2.0 and the on-disk format is frozen for the 1.x series. The engine is feature-complete, hardened against hostile input, and soak-tested single- and multi-threaded: multiple on-disk runs, background compaction, crash recovery, crash-safe writes (durability), bloom-filtered point reads (bloom), and a block cache, behind the Tier-1 API (open/put/get/delete/scan). See docs/API.md and docs/PERFORMANCE.md.

Where It Fits

lsm-db is a storage engine. It builds on:

  • wal-db — memtable durability and crash recovery
  • bloom-lib — SSTable point-read filtering
  • Hive DB — a candidate storage engine behind the StorageEngine trait

It stays foreign-compatible: usable standalone as an embedded key-value store.

Cross-Platform Support

Tier 1 Support:

  • Linux (x86_64, aarch64)
  • macOS (x86_64, Apple Silicon)
  • Windows (x86_64)

Behavior is verified on each target by the CI matrix.

Contributing

Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.