banyan/
lib.rs

1//! # Banyan trees
2//!
3//! ![](https://i.imgur.com/6icLbdz.jpg)
4//!
5//! Banyan trees in nature are the trees with the widest canopy. They also sprout new roots, which after some time can become indistinguishable from the original root.
6//!
7//! The banyan trees in this library are persistent trees with a high branching factor (wide canopy) that can be used to efficiently store and query large sequences of key value pairs.
8//!
9//! Banyan trees are optimized for *appending* new entries and filtered, streaming reading of values. They naturally support addressing elements by index.
10//! They **do not** support arbitrary insertion or deletion of elements, so they are **not** a general purpose map data structure.
11//!
12//! They are not balanced trees, but they share some properties with [B-Trees].
13//!
14//! ## Persistence
15//!
16//! Banyan trees are persistent, using a content-addressed storage system such as [ipfs] or a key value store.
17//! Data is [CBOR] encoded and [zstd] compressed for space efficient persistent storage and replication. It is also encrypted using the [chacha20] stream cipher.
18//!
19//! # Indexing
20//!
21//! Each banyan tree entry consists of a key part and a value part.
22//! The value part is an opaque value that can be serialized as [cbor]. This can not be used for filtering in queries.
23//! The key part can be used for efficient querying by defining how to compute summaries from keys.
24//! These summaries will be stored higher up in the tree to allow for efficient querying and filtered streaming.
25//!
26//! In addition to querying by key content, elements can always be efficiently accessed by offset.
27//!
28//! # Sealing and packing
29//!
30//! ## Sealing
31//!
32//! A leaf node of the tree is considered *sealed* as soon as the compressed data exceeds a certain size.
33//! This depends on configuration and compression rate.
34//! A branch node is considered *sealed* as soon as it has a certain, configurable number of child nodes.
35//!
36//! ## Packing
37//!
38//! Banyan trees are not balanced, since they do not have to support arbitrary insertion and deletion.
39//! There is a concept somewhat related to balancing called packing. A tree is considered packed
40//! if all but the last leaf node is packed, and branch nodes are packed to the left as much as possible.
41//! This is usually the most space efficient representation of the sequence of key value pairs, and also
42//! the most efficient one for subsequent appending.
43//!
44//! ## Unpacked trees
45//!
46//! Packing a tree after each append of a single entry or a small number of entries would be inefficient, since
47//! it would involve recreating a new branch node for each level and a leaf node.
48//!
49//! To allow for quick append, it is possible to add elements without packing. This creates a linked list of trees,
50//! and in the case of adding individual elements, a degenerate tree resembling a linked list.
51//!
52//! Unpacked trees can be packed at any time without changing the content.
53//!
54//! [CBOR]: https://en.wikipedia.org/wiki/CBOR
55//! [zstd]: https://en.wikipedia.org/wiki/Zstandard
56//! [chacha20]: https://en.wikipedia.org/wiki/Salsa20#ChaCha_variant
57//! [ipfs]: https://ipfs.io/
58//! [B-Trees]: https://en.wikipedia.org/wiki/B-tree
59mod forest;
60pub mod index;
61pub mod query;
62pub mod store;
63mod stream_builder;
64mod tree;
65mod util;
66use stream_builder::{CipherOffset, StreamBuilderState};
67
68#[cfg(feature = "metrics")]
69use prometheus::Registry;
70
71pub use chacha20;
72pub use forest::{Config, FilteredChunk, Forest, Secrets, Transaction, TreeTypes};
73pub use stream_builder::{StreamBuilder, StreamTransaction};
74pub use tree::Tree;
75
76#[cfg(test)]
77extern crate quickcheck;
78#[cfg(test)]
79#[macro_use(quickcheck)]
80extern crate quickcheck_macros;
81
82/// Register all prometheus metrics
83#[cfg(feature = "metrics")]
84pub fn register(registry: &Registry) -> anyhow::Result<()> {
85    forest::register(registry)?;
86    Ok(())
87}