Crate lsm_tree

source ·
Expand description

A K.I.S.S. implementation of log-structured merge trees (LSM-trees/LSMTs).

§NOTE

This crate only provides a primitive LSM-tree, not a full storage engine. You probably want to use https://crates.io/crates/fjall instead. For example, it does not ship with a write-ahead log, so writes are not persisted until manually flushing the memtable.

§About

This crate exports a Tree that supports a subset of the BTreeMap API.

LSM-trees are an alternative to B-trees to persist a sorted list of items (e.g. a database table) on disk and perform fast lookup queries. Instead of updating a disk-based data structure in-place, deltas (inserts and deletes) are added into an in-memory write buffer (MemTable). Data is then flushed to disk segments, as the write buffer reaches some threshold.

Amassing many segments on disk will degrade read performance and waste disk space usage, so segments can be periodically merged into larger segments in a process called Compaction. Different compaction strategies have different advantages and drawbacks, and should be chosen based on the workload characteristics.

Because maintaining an efficient structure is deferred to the compaction process, writing to an LSMT is very fast (O(1) complexity).

Keys are limited to 65536 bytes, values are limited to 2^32 bytes. As is normal with any kind of storage engine, larger keys and values have a bigger performance impact.

§Example usage

use lsm_tree::{Tree, Config};

// A tree is a single physical keyspace/index/...
// and supports a BTreeMap-like API
let tree = Config::new(folder).open()?;

// Note compared to the BTreeMap API, operations return a Result<T>
// So you can handle I/O errors if they occur
tree.insert("my_key", "my_value", /* sequence number */ 0);

let item = tree.get("my_key")?;
assert_eq!(Some("my_value".as_bytes().into()), item);

// Search by prefix
for item in &tree.prefix("prefix") {
  // ...
}

// Search by range
for item in &tree.range("a"..="z") {
  // ...
}

// Iterators implement DoubleEndedIterator, so you can search backwards, too!
for item in tree.prefix("prefix").into_iter().rev() {
  // ...
}

// Flush to secondary storage, clearing the memtable
// and persisting all in-memory data.
// Note, this flushes synchronously, which may not be desired
tree.flush_active_memtable()?;
assert_eq!(Some("my_value".as_bytes().into()), item);

// When some disk segments have amassed, use compaction
// to reduce the amount of disk segments

// Choose compaction strategy based on workload
use lsm_tree::compaction::Levelled;

let strategy = Levelled::default();
tree.compact(Arc::new(strategy))?;

assert_eq!(Some("my_value".as_bytes().into()), item);

Modules§

Structs§

  • Block cache, in which blocks are cached in-memory after being retrieved from disk
  • Tree configuration builder
  • The memtable serves as an intermediary storage for new items
  • Disk segment (a.k.a. SSTable, sorted string table) that is located on disk
  • Thread-safe sequence number generator
  • A snapshot captures a read-only point-in-time view of the tree at the time the snapshot was created
  • A log-structured merge tree (LSM-tree/LSMT)
  • Represents a value in the LSM-tree

Enums§

Type Aliases§

  • Tree result
  • Sequence number, a monotonically increasing counter
  • User defined key
  • User defined data (blob of bytes)