1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
//! Infinitree is a versioned, embedded database that uses uniform,
//! encrypted blobs to store data.
//!
//! Infinitree is based on a set of lockless and locking data
//! structures that you can use in your application as a regular map
//! or list.
//!
//! Data structures:
//!
//!  * [`fields::VersionedMap`]: A lockless HashMap that tracks incremental changes
//!  * [`fields::Map`]: A lockless HashMap
//!  * [`fields::LinkedList`]: Linked list that tracks incremental changes
//!  * [`fields::List`]: A simple `RwLock<Vec<_>>` alias
//!  * [`fields::Serialized`]: Any type that implements [`serde::Serialize`]
//!
//! Tight control over resources allows you to use it in situations
//! where memory is scarce, and fall back to querying from slower
//! storage.
//!
//! Additionally, Infinitree is useful for securely storing and sharing
//! any [`serde`](https://docs.rs/serde) serializable application
//! state, or dumping and loading application state changes through
//! commits. This is similar to [Git](https://git-scm.com).
//!
//! In case you're looking to store large amounts of binary blobs, you
//! can open a [`BufferedSink`][object::BufferedSink], which supports
//! `std::io::Write`, and store arbitrary byte streams in the tree.
//!
//! ## Features
//!
//!  * Encrypt all on-disk data, and only decrypt on use
//!  * Transparently handle hot/warm/cold storage tiers; currently S3-compatible backends are supported
//!  * Versioned data structures allow you to save/load/fork application state safely
//!  * Thread-safe by default
//!  * Iterate over random segments of data without loading to memory in full
//!  * Focus on performance and fine-grained control of memory use
//!  * Extensible for custom data types, storage backends, and serialization
//!
//! ## Example use
//!
//! ```
//! use infinitree::{
//!     *,
//!     crypto::UsernamePassword,
//!     backends::Directory,
//!     fields::{VersionedMap},
//! };
//! use serde::{Serialize, Deserialize};
//!
//! fn main() -> anyhow::Result<()> {
//!     let mut tree = Infinitree::<VersionedMap<String, usize>>::empty(
//!         Directory::new("../test_data")?,
//!         UsernamePassword::with_credentials("username".to_string(),
//!                                            "password".to_string())?
//!     ).unwrap();
//!
//!     tree.index().insert("sample_size".into(), 1234);
//!
//!     tree.commit("first measurement! yay!");
//!     Ok(())
//! }
//! ```
//!
//! ## Core concepts
//!
//! [`Infinitree`] provides is the first entry point to the
//! library. It creates, saves, and queries various versions of your
//! [`Index`].
//!
//! There are 2 types of interactions with an infinitree: one that's
//! happening through an [`Index`], and one that's directly accessing
//! the [`object`] structure.
//!
//! Any data stored in infinitree objects will receive a `ChunkPointer`,
//! which _must_ be stored somewhere to retrieve the data. Hence the
//! need for an index.
//!
//! An index can be any struct that implements the [`Index`]
//! trait. There's also a helpful [derive macro](derive@Index) that
//! helps you do this. An index will consist of various fields, which
//! act like regular old Rust types, but need to implement a few
//! traits to help serialization.
//!
//! ### Index
//!
//! You can think about your `Index` as a schema. Or just application
//! state on steroids.
//!
//! In a more abstract sense, the [`Index`] trait and corresponding
//! [derive macro](derive@Index) represent a view into a single
//! version of your database. Using an [`Infinitree`] you can swap
//! between, and mix-and-match data from, various versions of an
//! `Index` state.
//!
//! ### Fields
//!
//! An `Index` contains serializable fields. These are thread-safe
//! data structures with internal mutation, which support some kind of
//! serialization [`Strategy`].
//!
//! You can use any type that implements [`serde::Serialize`] as a
//! field through the `fields::Serialized` wrapper type, but there are
//! [incremental hash map][fields::VersionedMap] and
//! [list-like][fields::LinkedList] types available for you to use to
//! track and only save changes between versions of your data.
//!
//! Persisting and loading fields is done using an [`Intent`].  If you
//! use the [`Index`][derive@Index] macro, it will automatically
//! create accessor functions for each field in an index, and return
//! an `Intent` wrapped strategy.
//!
//! Intents elide the specific types of the field and allow doing
//! batch operations, e.g. when calling [`Infinitree::commit`] using a
//! different strategy for each field in an index.
//!
//! ### Strategy
//!
//! To tell Infinitree how to serialize a field, you can use different
//! strategies. A [`Strategy`] has full control over how a data structure
//! is serialized in the object system.
//!
//! Every strategy receives an `Index` transaction, and an
//! [`object::Reader`] or [`object::Writer`]. It is the responsibility
//! of the strategy to store [references](ChunkPointer) so you can
//! load back the data once persisted.
//!
//! There are 2 strategies in the base library:
//!
//!  * [`LocalField`]: Serialize all data in a single stream.
//!  * [`SparseField`]: Serialize keys and values of a Map in separate
//!  streams. Useful for quickly iterating over key indexes when
//!  querying. Currently only supports values smaller than 4MB.
//!
//! Deciding which strategy is best for your use case may mean you
//! have to run some experiments and benchmarks.
//!
//! See the documentation for the [`Index`][derive@Index] macro to see how to
//! use strategies.
//!
//! [`Intent`]: fields::Intent
//! [`Strategy`]: fields::Strategy
//! [`Load`]: fields::Load
//! [`Store`]: fields::Store
//! [`LocalField`]: fields::LocalField
//! [`SparseField`]: fields::SparseField
//!
//! ## Cryptographic design
//!
//! To read more about how the object system keeps your data safe,
//! please look at
//! [DESIGN.md](https://github.com/symmetree-labs/infinitree/blob/main/DESIGN.md)
//! file in the main repository.

#![deny(
    arithmetic_overflow,
    future_incompatible,
    nonstandard_style,
    rust_2018_idioms,
    trivial_casts,
    unused_crate_dependencies,
    unused_lifetimes,
    unused_qualifications,
    rustdoc::bare_urls,
    rustdoc::broken_intra_doc_links,
    rustdoc::invalid_codeblock_attributes,
    rustdoc::invalid_rust_codeblocks,
    rustdoc::private_intra_doc_links
)]
#![deny(clippy::all)]
#![allow(clippy::ptr_arg)]

#[cfg(any(test, doctest, bench))]
use criterion as _;

#[macro_use]
extern crate serde_derive;

pub mod backends;
mod chunks;
mod compress;
pub mod crypto;
pub mod fields;
mod id;
pub mod index;
pub mod object;
pub mod tree;

pub use anyhow;
pub use chunks::ChunkPointer;
pub use crypto::{Digest, Hasher, Key};
pub use index::Index;
pub use infinitree_macros::Index;
pub use object::ObjectId;
pub use tree::Infinitree;

pub(crate) use backends::Backend;
pub(crate) use id::Id;

use rmp_serde::decode::from_slice as deserialize_from_slice;
use rmp_serde::encode::write as serialize_to_writer;
use rmp_serde::to_vec as serialize_to_vec;
use rmp_serde::Deserializer;

/// Size of a storage object unit.
pub const BLOCK_SIZE: usize = 4 * 1024 * 1024;

#[cfg(test)]
const TEST_DATA_DIR: &str = "../test_data";