# MenhirKV
[MenhirKV](https://gitlab.com/liberecofr/menhirkv) is yet another local KV store based on [RocksDB](https://rocksdb.org)
and implemented in [Rust](https://www.rust-lang.org/).
In short this library simply allows you to store, locally,
pairs of key-values, provided the data is serializable.
It also guarantees entries will expire, at some point, so
that disk space usage remains under control.
Store your data, never worry about space.
Only uninteresting data you never access will automatically disappear.
Most low-level key-value store offer a `&Vec<u8>, &Vec<u8>` or similar interface, which is the
right thing to do, let the user of the library figure out the (de)serialization details.
MenhirKV figures out those details and makes a few opinionated choices, namely:
- use [Serde](https://serde.rs/) which is the de-facto standard in Rust,
but at the end of the day, pretty much everybody storing high-level objects
in Rust ends up using it... However MenhirKV also choose to depend
on [bincode](https://github.com/bincode-org/bincode) and *that* is possibly an
opinionated choice.
- expire the keys using a [Bloom Filter](https://en.wikipedia.org/wiki/Bloom_filter)
which provides semantics similar to a
[LRU](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU))
while being much more resource efficient. Interally it uses a
[custom made filter](https://crates.io/crates/ofilter).
- rely on [RocksDB](https://crates.io/crates/rocksdb) for storage. It has a strong
drawback which is: it's not pure Rust so you'll need to have [clang + llvm](https://clang.llvm.org/)
installed, which is a consequence of having a C++ dependency. There are many
alternatives in the Rust landscape, including, but not limited to
[sled](https://crates.io/crates/sled),
[Persy](https://crates.io/crates/persy),
[redb](https://crates.io/crates/redb),
[sankirja](https://crates.io/crates/sanakirja),
[lmdb](https://crates.io/crates/lmdb),
[sqlite](https://crates.io/crates/sqlite)...
All of them are great alternatives, RocksDB has the unique advantage of
being (very) widely used and tested, and as a personal note, I find the API
very easy to reason about. But the key factor is that RocksDB supports
[custom compaction filters](https://docs.rs/rocksdb/latest/rocksdb/struct.Options.html#method.set_compaction_filter).
With this feature, entries expiration happens during the natural
[compaction](https://github.com/facebook/rocksdb/wiki/Compaction) process.
This is a very specific and quite advanced feature,
and it proved quite useful in the current implementation.
In practice, MenhirKV offers:
- transparent (de)serialization, just `#[derive(Serialize, Deserialize)`
your types and it works out-of-the-box.
- error handling, use the `?;` syntax at will! Thanks to this great
[blog post about error handling](https://sled.rs/errors.html) by
the Sled maintainers.
- disk space control, just throw data into it, old unused data will
ultimately disappear, don't worry about it.
- friendly API,
[put](https://docs.rs/menhirkv/latest/menhirkv/struct.Store.html#method.put),
[get](https://docs.rs/menhirkv/latest/menhirkv/struct.Store.html#method.get),
[delete](https://docs.rs/menhirkv/latest/menhirkv/struct.Store.html#method.delete),
[iterator](https://docs.rs/menhirkv/latest/menhirkv/struct.Iter.html) and
other syntaxic sugar.
- thread safety, bomb it with parallel and concurrent requests,
RocksDB handles the magic for you.
But really, nothing new under the Sun, MenhirKV is just only
Rust + RocksDB + Bincode + Bloom filter mashed together.
It aims at speed and simplicity.

# Status
While this is, to my knowledge, not used in "real" production, it is
just a thin layer over well-tested, widely used packages.
So it *should be OK* to use it. Again, *DISCLAIMER*, use at your own risks.
[](https://gitlab.com/liberecofr/menhirkv/pipelines)
[](https://crates.io/crates/menhirkv)
[](https://gitlab.com/liberecofr/menhirkv/tree/main)
[](https://gitlab.com/liberecofr/menhirkv/blob/main/LICENSE)
# Usage
```rust
use menhirkv::Store;
// Example with a usize store, both keys and values.
// Feel free to use your own types, they just need to have Serde support.
let store: Store<usize, usize> = Store::open_temporary(100).unwrap();
store.put(&123, &456).unwrap();
assert_eq!(Some(456), store.get(&123).unwrap());
```
# Benchmarks
Taken from a random CI job:
```
running 6 tests
test tests::bench_extern_crate_kv_blob ... bench: 124,820 ns/iter (+/- 41,822)
test tests::bench_extern_crate_kv_bool ... bench: 6,693 ns/iter (+/- 970)
test tests::bench_menhirkv_blob_1k ... bench: 76,003 ns/iter (+/- 99,393)
test tests::bench_menhirkv_blob_max ... bench: 82,169 ns/iter (+/- 114,962)
test tests::bench_menhirkv_bool_1k ... bench: 9,214 ns/iter (+/- 879)
test tests::bench_menhirkv_bool_max ... bench: 9,108 ns/iter (+/- 707)
test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 0 filtered out; finished in 37.70s
```
This is not the result of extensive, thorough benchmarking, just a random
snapshot at some point in development history.
TL;DR -> serialization has a cost, [RocksDB](https://crates.io/crates/rocksdb) is fast.
The above test is ran on commodity virtual hardware available
on [Gitlab CI](https://gitlab.com/liberecofr/menhirkv/-/pipelines),
so on real production hardware, it is likely to be faster. Or not.
There is also a point of comparison with [kv](https://crates.io/crates/kv)
which is a similar package, though relying on [sled](https://crates.io/crates/sled).
To run the benchmarks:
```shell
cd bench
rustup default nightly
cargo bench
```
# About capacity
This is possibly the most unusual and controversial choice make by
MenhirKV so let's dive a bit deeper into it.
You set a `capacity` which is a *LOW* limit.
It is a mandatory parameter, typically a basic store
opening requires two parameters:
[path + capacity](https://docs.rs/menhirkv/latest/menhirkv/struct.Store.html#method.open_with_path).
To give an example, if you set a `capacity` of 10k (ten thousands) then
you have the guarantee `(*)` than you'll have those 10k entries, stored,
and no expiration. You may end up with up to 50k or maybe even 100k
entries stored on disk. But at some point, depending on how RocksDB
runs its internal compactions, and how the Bloom filter behaves,
both of which are unpredictable, the data will be filtered, compacted,
and the "old" keys removed.
What "old" means refers to the last time the key was accessed. The
entry may have been the first one ever written to the database, if
it keeps being accessed, either read or write, it remains on top of
the list of keys to preserve.
LRU caches do this in a very predictable manner, but they are
costly to maintain, especially when it comes to persistent store.
I made a toy project `(**)` around this, and can tell it does not perform well.
Most of the time the fuzzy strategy described above is good
enough. It ensures 2 things:
- hot data is always available
- disk space remains under control
The implementation detail trick that makes it efficient is that
by using a hooked custom [compaction filter](https://docs.rs/rocksdb/latest/rocksdb/compaction_filter/index.html)
the cost of expiring the unused entries is close to zero. Those
bits of data would have been processed by RocksDB anyway.
What MenhirKV does is only give a hint to RocksDB, at the
very moment it tries to figure out how to compact the data
and reorganize it on disk -> "oh well, you know what, we
don't need this, just drop it on the floor".
`(*)` well, almost, in edge cases, the number of kept entries
may go a bit below the planned `capacity`.
This is because of Bloom filter implementation and usage details,
but statistically, the store keeps more entries than the
requested `capacity`. Think of this `capacity` setting
as a fuzzy limit. If you really need a precise number,
MenhirKV is not for you.
`(**)` [DiskLRU](https://gitlab.com/liberecofr/disklru), a toy project experimenting about persistent LRU.
Working on it helped me a lot while making decisions for MenhirKV.
# Links
* [crate](https://crates.io/crates/menhirkv) on crates.io
* [doc](https://docs.rs/menhirkv/) on docs.rs
* [source](https://gitlab.com/liberecofr/menhirkv/tree/main) on gitlab.com
* [RocksDB](https://rocksdb.org/), the database powering this KV store
* [OFilter](https://gitlab.com/liberecofr/ofilter) (compaction filter)
# License
MenhirKV is licensed under the [MIT](https://gitlab.com/liberecofr/menhirkv/blob/main/LICENSE) license.