Crate skl

Source
Expand description

SKL

github LoC Build codecov

docs.rs crates.io crates.io

wakatime license
  1. A lock-free thread-safe concurrent SkipMap implementation based on ARENA skiplist which helps develop MVCC memtable for LSM-Tree.
  2. A lock-free thread-safe concurrent memory map based on-disk SkipMap.

§Installation

  • Only use heap backend (suppport no_std)

    [dependencies]
    skl = "0.22"
  • Enable memory map backend

    [dependencies]
    skl = { version = "0.22", features = ["memmap"] }

§Features

  • MVCC and 3D access: Builtin MVCC (multiple versioning concurrency control) and key-value-version access support.
  • Lock-free and Concurrent-Safe: SkipMap provide lock-free operations, ensuring efficient concurrent access without the need for explicit locking mechanisms.
  • Extensible for Key-Value Database Developers: Designed as a low-level crate, SkipMap offer a flexible foundation for key-value database developers. You can easily build your own memtable or durable storage using these structures.
  • Memory Efficiency: These data structures are optimized for minimal memory overhead. They operate around references, avoiding unnecessary allocations and deep copies, which can be crucial for efficient memory usage.
    • Segment tracker: Builtin segment recollection support, a lock-free freelist helps reuse free segments.
    • Prefix compression:
      • Same key will only be stored once.
      • Keys with common prefix will be stored once (longest one must be inserted first).
      • (experimental) Keys are sub-slice of negeibours will be stored once (require CompressionPolicy::High).
    • Discard tracker: Builtin discard tracker, which will track discarded bytes to help end-users decide when to compact or rewrite.
  • Efficient Iteration: Enjoy fast forward and backward iteration through the elements in your SkipMap. Additionally, bounded iterators are supported, allowing you to traverse only a specified range of elements efficiently.
  • Snapshot Support: Create snapshots of your SkipMap, offering a read-only view of the contents at a specific moment in time. Snapshots provide a consistent view of the data, enabling implementations of transactional semantics and other use cases where data consistency is crucial.
  • Memory Management Options:
    • Heap Allocation: Memory allocation is handled by Rust’s allocator, ensuring all data resides in RAM.
    • Mmap: Data can be mapped to a disk file by the operating system, making it suitable for durable storage.
    • Mmap Anonymous: Mapped to anonymous memory (virtual memory) by the OS, ideal for large in-memory memtables, optimizing memory utilization.

§Examples

Please see examples folder for more details.

§Q & A

  • Does the on-disk version SkipMap ensure crash safety or power failure resilience?

    No, If you really need a crash safe, power failure resilience, concurrent-safe and durable ordered write-ahead log implementation, see orderwal project.

    On-disk version SkipMap does not ensure crash safety or power failure resilience. Hence, it is not recommended to directly use the SkipMap as a durable database. It is recommended to use the on-disk version SkipMap as a final frozen file for quick lookup.

  • aol: Yet another generic purpose, append-only write-ahead log implementation.
  • orderwal: A generic-purpose, atomic, ordered, zero-copy, concurrent-safe, pre-allocate style (memory map) write-ahead-log for developing databases.

§Tests

  • test:

    cargo test --all-features
  • miri (Stack Borrows)

    MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check" \
    RUSTFLAGS = "--cfg all_skl_tests" \
    cargo miri test --all-features
  • miri (Tree Borrows)

    MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check -Zmiri-tree-borrows" \
    RUSTFLAGS = "--cfg all_skl_tests" \
    cargo miri test --all-features

§Support Platforms

See cross section in GitHub CI file.

§Pedigree

This code is inspired and modified based on Cockroachdb’s pebble arenaskl and Dgraph’s badger skl code:

https://github.com/cockroachdb/pebble/tree/master/internal/arenaskl

https://github.com/dgraph-io/badger/tree/master/skl

The pebble’s arenaskl code is based on Andy Kimball’s arenaskl code:

https://github.com/andy-kimball/arenaskl

The arenaskl code is based on the skiplist found in Badger, a Go-based KV store:

https://github.com/dgraph-io/badger/tree/master/skl

The skiplist in Badger is itself based on a C++ skiplist built for Facebook’s RocksDB:

https://github.com/facebook/rocksdb/tree/master/memtable

§License

skl is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details.

Copyright (c) 2022 Al Liu.

Re-exports§

pub use options::Options;
pub use among;
pub use either;

Modules§

dynamic
The dynamic key-value type SkipMaps.
error
Error types for the SkipMaps.
generic
The generic key-value type SkipMaps.
options
Options for the SkipMaps.

Structs§

Active
A state for the entry that is active.
GenericAllocator
Generic ARENA allocator
Header
The information of the SkipMap, which can be used to reconstruct the SkipMap.
Height
Height which is used to configure the maximum tower height of a skiplist, it is a 5-bit unsigned integer.
Inserter
A helper struct for caching splice information
KeyBuilder
A key builder for the SkipMaps, which requires the key size for accurate allocation and a closure to build the key.
KeySize
KeySize which is used to represent a length of a key stored in the skiplist, it is a 27-bit unsigned integer.
MaybeTombstone
A state for the entry that may be a tombstone.
VacantBuffer
A vacant buffer in the WAL.
ValueBuilder
A value builder for the SkipMaps, which requires the value size for accurate allocation and a closure to build the value.

Traits§

Allocator
A trait for easily interacting with the sync and unsync allocator allocators.
Arena
The wrapper trait over a underlying Allocator.
BufWriter
Writing self to the VacantBuffer in bytes format.
BufWriterOnce
Like BufWriter but only write once.
State
The state for the entry.
Transfer
Transfer trait for converting between different states.

Functions§

random_height
Utility function to generate a random height for a new node.

Type Aliases§

Version
Version, used for MVCC purpose, it is a 56-bit unsigned integer.