Expand description
SKL
- A lock-free thread-safe concurrent
SkipMap
implementation based on ARENA skiplist which helps develop MVCC memtable for LSM-Tree. - A lock-free thread-safe concurrent memory map based on-disk
SkipMap
.
§Installation
-
Only use heap backend (suppport
no_std
)[dependencies] skl = "0.22"
-
Enable memory map backend
[dependencies] skl = { version = "0.22", features = ["memmap"] }
§Features
- MVCC and 3D access: Builtin MVCC (multiple versioning concurrency control) and key-value-version access support.
- Lock-free and Concurrent-Safe:
SkipMap
provide lock-free operations, ensuring efficient concurrent access without the need for explicit locking mechanisms. - Extensible for Key-Value Database Developers: Designed as a low-level crate,
SkipMap
offer a flexible foundation for key-value database developers. You can easily build your own memtable or durable storage using these structures. - Memory Efficiency: These data structures are optimized for minimal memory overhead. They operate around references, avoiding unnecessary allocations and deep copies, which can be crucial for efficient memory usage.
- Segment tracker: Builtin segment recollection support, a lock-free freelist helps reuse free segments.
- Prefix compression:
- Same key will only be stored once.
- Keys with common prefix will be stored once (longest one must be inserted first).
- (experimental) Keys are sub-slice of negeibours will be stored once (require
CompressionPolicy::High
).
- Discard tracker: Builtin discard tracker, which will track discarded bytes to help end-users decide when to compact or rewrite.
- Efficient Iteration: Enjoy fast forward and backward iteration through the elements in your
SkipMap
. Additionally, bounded iterators are supported, allowing you to traverse only a specified range of elements efficiently. - Snapshot Support: Create snapshots of your
SkipMap
, offering a read-only view of the contents at a specific moment in time. Snapshots provide a consistent view of the data, enabling implementations of transactional semantics and other use cases where data consistency is crucial. - Memory Management Options:
- Heap Allocation: Memory allocation is handled by Rust’s allocator, ensuring all data resides in RAM.
- Mmap: Data can be mapped to a disk file by the operating system, making it suitable for durable storage.
- Mmap Anonymous: Mapped to anonymous memory (virtual memory) by the OS, ideal for large in-memory memtables, optimizing memory utilization.
§Examples
Please see examples folder for more details.
§Q & A
-
Does the on-disk version
SkipMap
ensure crash safety or power failure resilience?No, If you really need a crash safe, power failure resilience, concurrent-safe and durable ordered write-ahead log implementation, see
orderwal
project.On-disk version
SkipMap
does not ensure crash safety or power failure resilience. Hence, it is not recommended to directly use theSkipMap
as a durable database. It is recommended to use the on-disk versionSkipMap
as a final frozen file for quick lookup.
§Related projects
aol
: Yet another generic purpose, append-only write-ahead log implementation.orderwal
: A generic-purpose, atomic, ordered, zero-copy, concurrent-safe, pre-allocate style (memory map) write-ahead-log for developing databases.
§Tests
-
test
:cargo test --all-features
-
miri
(Stack Borrows)MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check" \ RUSTFLAGS = "--cfg all_skl_tests" \ cargo miri test --all-features
-
miri
(Tree Borrows)MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check -Zmiri-tree-borrows" \ RUSTFLAGS = "--cfg all_skl_tests" \ cargo miri test --all-features
§Support Platforms
See cross
section in GitHub CI file.
§Pedigree
This code is inspired and modified based on Cockroachdb’s pebble arenaskl and Dgraph’s badger skl code:
https://github.com/cockroachdb/pebble/tree/master/internal/arenaskl
https://github.com/dgraph-io/badger/tree/master/skl
The pebble’s arenaskl code is based on Andy Kimball’s arenaskl code:
https://github.com/andy-kimball/arenaskl
The arenaskl code is based on the skiplist found in Badger, a Go-based KV store:
https://github.com/dgraph-io/badger/tree/master/skl
The skiplist in Badger is itself based on a C++ skiplist built for Facebook’s RocksDB:
https://github.com/facebook/rocksdb/tree/master/memtable
§License
skl
is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2022 Al Liu.
Re-exports§
Modules§
- dynamic
- The dynamic key-value type
SkipMap
s. - error
- Error types for the
SkipMap
s. - generic
- The generic key-value type
SkipMap
s. - options
- Options for the
SkipMap
s.
Structs§
- Active
- A state for the entry that is active.
- Generic
Allocator - Generic ARENA allocator
- Header
- The information of the
SkipMap
, which can be used to reconstruct theSkipMap
. - Height
- Height which is used to configure the maximum tower height of a skiplist, it is a 5-bit unsigned integer.
- Inserter
- A helper struct for caching splice information
- KeyBuilder
- A key builder for the
SkipMap
s, which requires the key size for accurate allocation and a closure to build the key. - KeySize
- KeySize which is used to represent a length of a key stored in the skiplist, it is a 27-bit unsigned integer.
- Maybe
Tombstone - A state for the entry that may be a tombstone.
- Vacant
Buffer - A vacant buffer in the WAL.
- Value
Builder - A value builder for the
SkipMap
s, which requires the value size for accurate allocation and a closure to build the value.
Traits§
- Allocator
- A trait for easily interacting with the sync and unsync allocator allocators.
- Arena
- The wrapper trait over a underlying
Allocator
. - BufWriter
- Writing self to the
VacantBuffer
in bytes format. - BufWriter
Once - Like
BufWriter
but only write once. - State
- The state for the entry.
- Transfer
- Transfer trait for converting between different states.
Functions§
- random_
height - Utility function to generate a random height for a new node.
Type Aliases§
- Version
- Version, used for MVCC purpose, it is a 56-bit unsigned integer.