lsm-db 0.2.0

Log-structured merge-tree storage engine for Rust. Memtable, leveled SSTables, background compaction, and bloom-filtered point reads over a durable wal-db log. A composable storage engine for embedded databases and Hive DB.
Documentation

Available now (0.2):

  • Memtable — in-memory sorted write buffer; flushes to an immutable sorted run when full
  • Durable flush — runs are written atomically and fsynced; flushed data survives reopening
  • Tombstone deletes — deletes mask older values and resolve away on flush
  • Range scans — merge the buffer and the on-disk run into one sorted stream
  • Grouped writes — apply a batch atomically with respect to concurrent readers
  • Shared, thread-safe handle — one engine, many threads, behind an Arc

On the roadmap:

  • Leveled SSTables + background compaction — multiple runs, merged in the background (0.3)
  • Write-ahead logging — crash-safe un-flushed writes via wal-db, under durability (0.4)
  • Bloom filters — skip runs that can't contain a key, under bloom (0.5)
  • Pluggable comparator — custom key ordering (0.5)

Installation

[dependencies]
lsm-db = "0.2"

Quick Start

use lsm_db::Lsm;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Open (or create) a database backed by a directory.
    let db = Lsm::open("my-db")?;

    // Keys and values are arbitrary bytes.
    db.put(b"user:1", b"alice")?;
    db.put(b"user:2", b"bob")?;

    // Point reads return owned values.
    assert_eq!(db.get(b"user:1")?, Some(b"alice".to_vec()));

    // Deletes mask the key.
    db.delete(b"user:1")?;
    assert_eq!(db.get(b"user:1")?, None);

    // Range scans walk keys in sorted order.
    db.put(b"user:1", b"alice")?;
    for (key, value) in db.scan(b"user:".to_vec()..b"user;".to_vec())? {
        println!("{} = {}", String::from_utf8_lossy(&key), String::from_utf8_lossy(&value));
    }

    // Force the buffer to disk; it will be there on the next open.
    db.flush()?;
    Ok(())
}

Tuning lives behind LsmConfig; grouped writes behind Batch. See docs/API.md for the full reference and the examples/ directory for runnable programs.

Status

This is the v0.2.0 foundation release: the Tier-1 API (open/put/get/delete/scan) is implemented and tested over a single on-disk run. Multi-level compaction, durability, and bloom filters land across the rest of the 0.x series per the project roadmap and docs/API.md. The on-disk format is not yet frozen.

Where It Fits

lsm-db is a storage engine. It builds on:

  • wal-db — memtable durability and crash recovery
  • bloom-lib — SSTable point-read filtering
  • pack-io — on-disk record framing
  • Hive DB — a candidate storage engine behind the StorageEngine trait

It stays foreign-compatible: usable standalone as an embedded key-value store.

Cross-Platform Support

Tier 1 Support:

  • Linux (x86_64, aarch64)
  • macOS (x86_64, Apple Silicon)
  • Windows (x86_64)

Behavior is verified on each target by the CI matrix.

Contributing

Before opening a PR, cargo fmt --all, cargo clippy --all-targets --all-features -- -D warnings, and cargo test --all-features must be clean. Hot-path changes require a criterion benchmark; correctness-critical paths require property and/or loom tests.