infinitree 0.7.0

Embedded, encrypted database with tiered cache
Documentation

Infinitree is a versioned, embedded database that uses uniform, encrypted blobs to store data.

Infinitree is based on a set of lockless and locking data structures that you can use in your application as a regular map or list.

Data structures:

  • [fields::VersionedMap]: A lockless HashMap that tracks incremental changes
  • [fields::Map]: A lockless HashMap
  • [fields::LinkedList]: Linked list that tracks incremental changes
  • [fields::List]: A simple RwLock<Vec<_>> alias
  • [fields::Serialized]: Any type that implements [serde::Serialize]

Tight control over resources allows you to use it in situations where memory is scarce, and fall back to querying from slower storage.

Additionally, Infinitree is useful for securely storing and sharing any serde serializable application state, or dumping and loading application state changes through commits. This is similar to Git.

In case you're looking to store large amounts of binary blobs, you can open a [BufferedSink][object::BufferedSink], which supports std::io::Write, and store arbitrary byte streams in the tree.

Features

  • Encrypt all on-disk data, and only decrypt on use
  • Transparently handle hot/warm/cold storage tiers; currently S3-compatible backends are supported
  • Versioned data structures allow you to save/load/fork application state safely
  • Thread-safe by default
  • Iterate over random segments of data without loading to memory in full
  • Focus on performance and fine-grained control of memory use
  • Extensible for custom data types, storage backends, and serialization

Example use

use infinitree::{
Infinitree,
Index,
Key,
anyhow,
backends::Directory,
fields::{VersionedMap},
};
use serde::{Serialize, Deserialize};

fn main() -> anyhow::Result<()> {
let mut tree = Infinitree::<VersionedMap<String, usize>>::empty(
Directory::new("test_data")?,
Key::from_credentials("username", "password")?
).unwrap();

tree.index().insert("sample_size".into(), 1234);

tree.commit("first measurement! yay!");
Ok(())
}

Core concepts

[Infinitree] provides is the first entry point to the library. It creates, saves, and queries various versions of your [Index].

There are 2 types of interactions with an infinitree: one that's happening through an [Index], and one that's directly accessing the [object] structure.

Any data stored in infinitree objects will receive a ChunkPointer, which must be stored somewhere to retrieve the data. Hence the need for an index.

An index can be any struct that implements the [Index] trait. There's also a helpful derive macro that helps you do this. An index will consist of various fields, which act like regular old Rust types, but need to implement a few traits to help serialization.

Index

You can think about your Index as a schema. Or just application state on steroids.

In a more abstract sense, the [Index] trait and corresponding derive macro represent a view into a single version of your database. Using an [Infinitree] you can swap between, and mix-and-match data from, various versions of an Index state.

Fields

An Index contains serializable fields. These are thread-safe data structures with internal mutation, which support some kind of serialization Strategy.

You can use any type that implements [serde::Serialize] as a field through the fields::Serialized wrapper type, but there are [incremental hash map][fields::VersionedMap] and [list-like][fields::LinkedList] types available for you to use to track and only save changes between versions of your data.

Persisting and loading fields is done using an Intent. If you use the [Index][derive@Index] macro, it will automatically create accessor functions for each field in an index, and return an Intent wrapped strategy.

Intents elide the specific types of the field and allow doing batch operations, e.g. when calling [Infinitree::commit] using a different strategy for each field in an index.

Strategy

To tell Infinitree how to serialize a field, you can use different strategies. A Strategy has full control over how a data structure is serialized in the object system.

Every strategy receives an Index transaction, and an [object::Reader] or [object::Writer]. It is the responsibility of the strategy to store references so you can load back the data once persisted.

There are 2 strategies in the base library:

  • LocalField: Serialize all data in a single stream.
  • SparseField: Serialize keys and values of a Map in separate streams. Useful for quickly iterating over key indexes when querying. Currently only supports values smaller than 4MB.

Deciding which strategy is best for your use case may mean you have to run some experiments and benchmarks.

See the documentation for the [Index][derive@Index] macro to see how to use strategies.

Cryptographic design

To read more about how the object system keeps your data safe, please look at DESIGN.md file in the main repository.