rsdb 0.7.3

a flash-sympathetic persistent lock-free B+ tree, pagecache, and log
Documentation

RSDB

a flash-sympathetic persistent lock-free B+ tree, pagecache, and log

documentation

rsdb = "0.7"

extern crate rsdb;

let tree = rsdb::Config::default()
  .path(Some("/var/lib/mydb/storage_file".to_owned()))
  .tree();

let k1 = b"yo!".to_vec();
let v1 = b"v1".to_vec();

// set and get
tree.set(k1.clone(), v1.clone());
assert_eq!(tree.get(&k1), Some(v1.clone()));

let v2 = b"v2".to_vec();

// compare and swap
assert_eq!(
  //    key         old       new
  tree.cas(k1.clone(), Some(v1), Some(v2.clone())),
  Ok(()),
);

// scans
let mut iter = tree.scan(b"a non-present key before yo!");
assert_eq!(iter.next(), Some((k1, v2)));
assert_eq!(iter.next(), None);

// deletion
tree.del(b"yo!");

progress

  • lock-free log with reservable slots
  • lock-free pagecache with cache-friendly partial updates
  • lock-free b-link tree
  • recovery
  • zstd compression
  • LRU cache
  • pagetable snapshotting for faster recovery
  • epoch-based gc
  • C API
  • formal verification of lock-free algorithms via symbolic execution
  • log cleaning
  • higher-level interface with multi-key transaction and snapshot support
  • merge operator support

Goals

  1. don't use so much electricity. our data structures should play to modern hardware's strengths.
  2. don't surprise users with performance traps.
  3. bring reliability techniques from academia into real-world practice.

Architecture

Lock-free trees on a lock-free pagecache on a lock-free log. The pagecache scatters partial page fragments across the log, rather than rewriting entire pages at a time as B+ trees for spinning disks historically have. On page reads, we concurrently scatter-gather reads across the log to materialize the page from its fragments.

The system is largely inspired by the Deuteronomy architecture, and aims to implement the best features from RocksDB as well.

References