[][src]Struct tari_storage::lmdb_store::LMDBStore

pub struct LMDBStore { /* fields omitted */ }

A Struct for holding state for an LM Database. LMDB is memory mapped, so you can treat the DB as an (essentially) infinitely large memory-backed hashmap. A single environment is stored in one file. The individual databases are key-value tables stored within the file.

LMDB databases are thread-safe.

To create an instance of LMDBStore, use LMDBBuilder.

Memory efficiency

LMDB really only understands raw byte arrays. Complex structures need to be referenced as (what looks like) a single contiguous blob of memory. This presents some trade offs we need to make when inserting and getting data to/from LMDB.

Writing

For simple types, like PublickKey([u8; 32]), it's most efficient to pass a pointer to the memory position; and LMDB will do (at most) a single copy into its memory structures. the lmdb-zero crate assumes this by only requiring the AsLmdbBytes trait when inserting data. i.e. insert does does take ownership of the key or value; it just wants to be able to read the [u8].

This poses something of a problem for complex structures. Structs typically don't have a contiguous block of memory backing the instance, and so you either need to impose one (which isn't a great idea-- now you have to write some sort of memory management software), or you eat the cost of doing an intermediate copy into a buffer every time you need to commit a structure to LMDB.

However, this cost is mitigated if there's any kind of processing that needs to be done in converting T to [u8] (e.g. if an IP address is stored as a string for some reason, you might want to represent it as [u8; 4]) , which probably happens more often than we think, and offers maximum flexibility.

Furthermore, the "simple" types are typically quite small, so an additional copy is not usually incurring much overhead.

So this library makes the trade-off of carrying out two copies per write whilst gaining a significant amount of flexibility in the process.

Reading

When LMDB returns data from a get request, it returns a &[u8] - you cannot take ownership of this data. Therefore we necessarily need to copy data anyway in order to pull data into the final Struct instance. So the From<&[u8]> for T trait implementation will work for reading, and this works fine for both simple and complex data structures.

FromLmdbBytes is not quite what we want because the trait function returns a reference to an object, rather than the object itself.

An additional consideration is: how was this data serialised? If the writing was a straight memory dump, we don't always have enough information to reconstruct our data object (how long was a string? How many elements were in the array? Was it big- or little-endian ordering of integers?).

If we have to store this metadata when reading in byte strings, it means it had to be stored too. This is a further roadblock to the "zero-copy" ideal for writing. And since we're now basically serialising and de-serialising, we may as well use a well-known, highly efficient binary format to do so.

Serialisation

The ideal serialisation format is the one that does the least "bit-twiddling" between memory and the byte array; as well as being as compact as possible.

Candidates include: Bincode, MsgPack, and Protobuf / Cap'nProto. Without spending ages on a comparison, I just took the benchmark results from this project:

test clone                             ... bench:       1,179 ns/iter (+/- 115) = 444 MB/s

test capnp_deserialize                 ... bench:         277 ns/iter (+/- 27) = 1617 MB/s  **
test flatbuffers_deserialize           ... bench:           0 ns/iter (+/- 0) = 472000 MB/s ***
test rust_bincode_deserialize          ... bench:       1,533 ns/iter (+/- 228) = 260 MB/s
test rmp_serde_deserialize             ... bench:       1,859 ns/iter (+/- 186) = 154 MB/s
test rust_protobuf_deserialize         ... bench:         558 ns/iter (+/- 29) = 512 MB/s   *
test serde_json_deserialize            ... bench:       2,244 ns/iter (+/- 249) = 269 MB/s

test capnp_serialize                   ... bench:          28 ns/iter (+/- 5) = 16000 MB/s  **
test flatbuffers_serialize             ... bench:           0 ns/iter (+/- 0) = 472000 MB/s ***
test rmp_serde_serialize               ... bench:         278 ns/iter (+/- 27) = 1032 MB/s
test rust_bincode_serialize            ... bench:         190 ns/iter (+/- 43) = 2105 MB/s  *
test rust_protobuf_serialize           ... bench:         468 ns/iter (+/- 18) = 611 MB/s
test serde_json_serialize              ... bench:       1,012 ns/iter (+/- 55) = 597 MB/s

Based on these benchmarks, Flatbuffers and Cap'nProto are far and away the quickest. However, looking at the benchmarks more closely, we see that these aren't strictly Orange to Orange comparisons. The flatbuffers and capnproto tests don't actually serialise to and from the general Rust struct (an HTTP request type template), but from specially generated structs based on the schema.

Strictly speaking, if we're going to serialise arbitrary key-value types, these benchmarks should include the time it takes to populate a flatbuffer / capnproto structure.

A quick modification of the benchmarks to take this int account this reveals:

test rust_bincode_deserialize          ... bench:       1,505 ns/iter (+/- 361) = 265 MB/s *
test capnp_deserialize                 ... bench:         282 ns/iter (+/- 37) = 1588 MB/s ***
test rmp_serde_deserialize             ... bench:       1,800 ns/iter (+/- 144) = 159 MB/s *

test capnp_serialize                   ... bench:         941 ns/iter (+/- 40) = 476 MB/s  *
test rmp_serde_serialize               ... bench:         269 ns/iter (+/- 19) = 1066 MB/s **
test rust_bincode_serialize            ... bench:         191 ns/iter (+/- 41) = 1114 MB/s ***

Now bincode emerges as a reasonable contender. Another positive to bincode is that one doesn't have to update and maintain a schema for the data types begin serialized, nor is a separate compilation step required.

So after all this, we'll use bincode for the time being to handle serialisation to- and from- LMDB

Implementations

impl LMDBStore[src]

pub fn flush(&self) -> Result<(), Error>[src]

Close all databases and close the environment. You cannot be guaranteed that the dbs will be closed after calling this function because there still may be threads accessing / writing to a database that will block this call. However, in that case shutdown returns an error.

pub fn log_info(&self)[src]

pub fn get_handle(&self, db_name: &str) -> Option<LMDBDatabase>[src]

Returns a handle to the database given in db_name, if it exists, otherwise return None.

pub fn env_config(&self) -> LMDBConfig[src]

pub fn env(&self) -> Arc<Environment>[src]

pub fn resize_if_required(
    env: &Environment,
    config: &LMDBConfig
) -> Result<(), LMDBError>
[src]

Resize the LMDB environment if the resize threshold is breached.

Auto Trait Implementations

Blanket Implementations

impl<T> Any for T where
    T: 'static + ?Sized
[src]

impl<T> Borrow<T> for T where
    T: ?Sized
[src]

impl<T> BorrowMut<T> for T where
    T: ?Sized
[src]

impl<T> From<T> for T[src]

impl<T, U> Into<U> for T where
    U: From<T>, 
[src]

impl<T> SafeBorrow<T> for T where
    T: ?Sized

impl<T, U> TryFrom<U> for T where
    U: Into<T>, 
[src]

type Error = Infallible

The type returned in the event of a conversion error.

impl<T, U> TryInto<U> for T where
    U: TryFrom<T>, 
[src]

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

impl<V, T> VZip<V> for T where
    V: MultiLane<T>,