Crate lsm_tree

Source
Expand description

A K.I.S.S. implementation of log-structured merge trees (LSM-trees/LSMTs).

§NOTE

This crate only provides a primitive LSM-tree, not a full storage engine. You probably want to use https://crates.io/crates/fjall instead. For example, it does not ship with a write-ahead log, so writes are not persisted until manually flushing the memtable.

§About

This crate exports a Tree that supports a subset of the BTreeMap API.

LSM-trees are an alternative to B-trees to persist a sorted list of items (e.g. a database table) on disk and perform fast lookup queries. Instead of updating a disk-based data structure in-place, deltas (inserts and deletes) are added into an in-memory write buffer (Memtable). Data is then flushed to disk segments, as the write buffer reaches some threshold.

Amassing many segments on disk will degrade read performance and waste disk space usage, so segments can be periodically merged into larger segments in a process called Compaction. Different compaction strategies have different advantages and drawbacks, and should be chosen based on the workload characteristics.

Because maintaining an efficient structure is deferred to the compaction process, writing to an LSMT is very fast (O(1) complexity).

Keys are limited to 65536 bytes, values are limited to 2^32 bytes. As is normal with any kind of storage engine, larger keys and values have a bigger performance impact.

§Example usage

use lsm_tree::{AbstractTree, Config, Tree};

// A tree is a single physical keyspace/index/...
// and supports a BTreeMap-like API
let tree = Config::new(folder).open()?;

// Note compared to the BTreeMap API, operations return a Result<T>
// So you can handle I/O errors if they occur
tree.insert("my_key", "my_value", /* sequence number */ 0);

let item = tree.get("my_key", None)?;
assert_eq!(Some("my_value".as_bytes().into()), item);

// Search by prefix
for item in tree.prefix("prefix", None, None) {
  // ...
}

// Search by range
for item in tree.range("a"..="z", None, None) {
  // ...
}

// Iterators implement DoubleEndedIterator, so you can search backwards, too!
for item in tree.prefix("prefix", None, None).rev() {
  // ...
}

// Flush to secondary storage, clearing the memtable
// and persisting all in-memory data.
// Note, this flushes synchronously, which may not be desired
tree.flush_active_memtable(0)?;
assert_eq!(Some("my_value".as_bytes().into()), item);

// When some disk segments have amassed, use compaction
// to reduce the amount of disk segments

// Choose compaction strategy based on workload
use lsm_tree::compaction::Leveled;

let strategy = Leveled::default();

let version_gc_threshold = 0;
tree.compact(Arc::new(strategy), version_gc_threshold)?;

assert_eq!(Some("my_value".as_bytes().into()), item);

Modules§

compaction
Contains compaction strategies
gc
Blob garbage collection utilities

Structs§

BlobTree
A key-value-separated log-structured merge tree
Cache
Cache, in which blocks or blobs are cached in-memory after being retrieved from disk
Config
Tree configuration builder
Memtable
The memtable serves as an intermediary, ephemeral, sorted storage for new items
Segment
Disk segment (a.k.a. SSTable, SST, sorted string table) that is located on disk
SequenceNumberCounter
Thread-safe sequence number generator
Slice
An immutable byte slice that can be cloned without additional heap allocation
Snapshot
A snapshot captures a read-only point-in-time view of the tree at the time the snapshot was created
Tree
A log-structured merge tree (LSM-tree/LSMT)

Enums§

AnyTree
May be a standard Tree or a BlobTree
CompressionType
Compression algorithm to use.
DecodeError
Error during deserialization
EncodeError
Error during serialization
Error
Represents errors that can occur in the LSM-tree
TreeType
LSM-tree type
ValueType
Value type (regular value or tombstone)
Version
Disk format version

Traits§

AbstractTree
Generic Tree API
BlobCache
Blob cache, in which blobs are cached in-memory after being retrieved from disk

Type Aliases§

KvPair
KV-tuple, typically returned by an iterator
Result
Tree result
SeqNo
Sequence number - a monotonically increasing counter
UserKey
User defined key
UserValue
User defined data (blob of bytes)