1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
//! Infinitree is a versioned, embedded database that uses uniform,
//! encrypted blobs to store data.
//!
//! It works best for use cases with independent writer processes, as
//! multiple writer processes on a single tree are not supported.
//!
//! In fact, calling Infinitree a database may be generous, as all
//! persistence-related operations are explicit. Under the hood, it's
//! using `serde` for flexibility and interoperability with the most
//! libraries out of the box.
//!
//! ## Features
//!
//!  * Thread-safe by default
//!  * Transparently handle hot/warm/cold storage tiers; currently S3-compatible backends is supported
//!  * Versioned data structures that can be queried using the `Iterator` trait without loading in full
//!  * Encrypt all on-disk data, and only decrypt it on use
//!  * Focus on performance and flexible choice of performance/memory use tradeoffs
//!  * Extensible for custom data types and storage strategies
//!  * Easy to integrate with cloud workers & KMS for access control
//!
//! ## Example use
//!
//! ```no_run
//! use infinitree::{
//!     Infinitree,
//!     Index,
//!     Key,
//!     anyhow,
//!     backends::Directory,
//!     fields::{Serialized, VersionedMap, LocalField},
//! };
//! use serde::{Serialize, Deserialize};
//!
//! #[derive(Serialize, Deserialize)]
//! pub struct PlantHealth {
//!     id: usize,
//!     air_humidity: usize,
//!     soil_humidity: usize,
//!     temperature: f32
//! }
//!
//! #[derive(Index, Default, Clone)]
//! pub struct Measurements {
//!     // rename the field when serializing
//!     #[infinitree(name = "last_time")]
//!     _old_last_time: Serialized<String>,
//!
//!     #[infinitree(name = "last_time2")]
//!     last_time: Serialized<usize>,
//!
//!     // only store the keys in the index, not the values
//!     #[infinitree(strategy = "infinitree::fields::SparseField")]
//!     measurements: VersionedMap<usize, PlantHealth>,
//!
//!     // skip the next field when loading & serializing
//!     #[infinitree(skip)]
//!     current_time: usize,
//! }
//!
//! fn main() -> anyhow::Result<()> {
//!     let mut tree = Infinitree::<Measurements>::empty(
//!         Directory::new("/storage")?,
//!         Key::from_credentials("username", "password")?
//!     );
//!
//!     tree.index().measurements.insert(1, PlantHealth {
//!         id: 0,
//!         air_humidity: 50,
//!         soil_humidity: 60,
//!         temperature: 23.3,
//!     });
//!
//!     *tree.index().last_time.write() = 1;
//!     tree.commit("first measurement! yay!");
//!     Ok(())
//! }
//! ```
//!
//! ## Core concepts
//! ### Infinitree
//!
//! [`Infinitree`] provides high-level versioning, querying, and key
//! and memory management operations for working with the different
//! [`fields`] in the [`Index`].
//!
//! An Infinitree instance is mainly acting as a context for all
//! operations on the tree, and will be your first entry point when
//! working with trees and persisting them.
//!
//! Persisting any [field](#fields-1) in an [Index](#index) will require
//! an [`Intent`] to ensure the right
//! [`Strategy`] is being used for
//! persistence.
//!
//! ### Index
//!
//! In the most simplistic case, you can think about your Index as a
//! schema for a tree.
//!
//! In a more complicated setup, the [`Index`] trait and
//! corresponding [derive macro](derive@Index) represent an view into
//! a single version of your data. Using an [`Infinitree`] you can
//! swap between the various versions and mix-and-match data from
//! various versions into a single Index instance.
//!
//! Interaction with Index member fields is straightforward. However,
//! the [derive macro](derive@Index) will generate functions that
//! produce an [`Intent`] for any operation that touches the
//! persistence layer, such as [`Store`] and [`Load`].
//!
//! ### Fields
//!
//! An Index consists of fields. These are thread-safe data structures
//! with internal mutation, which support some kind of serialization
//! [`Strategy`].
//!
//! You can use any type that implements [`serde::Serialize`] as a
//! field, through the `fields::Serialized` wrapper type.
//!
//! Persisting and loading fields is done using an [`Intent`]
//! wrapper. If you use the [`Index`][derive@Index] macro, this will
//! automatically create accessor functions for each field in an
//! index, that return an `Intent` wrapped strategy.
//!
//! This is to elide the specific types and allow doing batch
//! operations, e.g. when calling [`Infinitree::commit`] using a
//! different strategy for each field in an Index.
//!
//! ### Strategy
//!
//! To tell Infinitree how to serialize an field, you can use different
//! strategies. A strategy has full control over the field and the
//! serializers/loader transactions for it, which means you can
//! control the performance and placement of pieces of data.
//!
//! Every strategy receives an Index transaction, and a Object
//! reader/writer. It is the responsibility of the strategy to store
//! references so you can load back the data once persisted.
//!
//! There are 2 strategies in the base library:
//!
//!  * [`LocalField`]: Store all of the data in the index. This is the
//!  default.
//!  * [`SparseField`]: Store values in a Map outside of the
//!  index. Best suited for large structs as values.
//!
//! Deciding which strategy is best for your use case may mean you
//! have to run some experiments. A `SparseField` is generally useful
//! for indexing large structs that you want to query rather than load
//! at once.
//!
//! See the documentation for the [`Index`][derive@Index] macro to see how to
//! use strategies.
//!
//! [`Intent`]: fields::Intent
//! [`Strategy`]: fields::Strategy
//! [`Load`]: fields::Load
//! [`Store`]: fields::Store
//! [`LocalField`]: fields::LocalField
//! [`SparseField`]: fields::SparseField

#![deny(
    future_incompatible,
    nonstandard_style,
    rust_2018_idioms,
    trivial_casts,
    unused_crate_dependencies,
    unused_lifetimes,
    unused_qualifications,
    rustdoc::bare_urls,
    rustdoc::broken_intra_doc_links,
    rustdoc::invalid_codeblock_attributes,
    rustdoc::invalid_rust_codeblocks,
    rustdoc::private_intra_doc_links
)]
#![deny(clippy::all)]
#![allow(clippy::ptr_arg)]
#![deny()]

#[macro_use]
extern crate serde_derive;

pub mod backends;
mod chunks;
mod compress;
mod crypto;
pub mod fields;
pub mod index;
pub mod object;
mod tree;

pub use backends::Backend;
pub use chunks::ChunkPointer;
pub use crypto::{secure_hash, Digest, Key};
pub use index::Index;
pub use object::ObjectId;
pub use tree::Infinitree;

pub use anyhow;

pub use infinitree_macros::Index;

use rmp_serde::decode::from_read_ref as deserialize_from_slice;
use rmp_serde::to_vec as serialize_to_vec;
use rmp_serde::Deserializer;

// Use block size of 4MiB for now
const BLOCK_SIZE: usize = 4 * 1024 * 1024;