jupiter_rs/infograph/
mod.rs

1//! Provides a compact and highly efficient representation of in-memory databases.
2//!
3//! `infograph` permits to generate or load structured data, which can then be queried in various
4//! ways. This could be anything from simple code lists (name value pair) to complex graphs of
5//! master data like nested lookup tables translated into several languages.
6//!
7//! The work horse is the [Doc](docs::Doc) struct which contains the root [Node](node::Node) and
8//! the [SymbolTable](symbols::SymbolTable). Nodes as the internal representation of the supported
9//! data types (strings, ints, bools, lists and maps). The external interface used to access them
10//! is [Element](docs::Element) which carries the symbol table along.
11//!
12//! Symbols are simply numbers (`i32`) which are managed by a [SymbolTable](symbols::SymbolTable).
13//! The reason is that many documents contain lots of inner objects (maps) which all share the same
14//! keys. To optimize memory consumption and also to improve access times, these keys are only
15//! put in them symbol table once and then referenced as [Symbol](symbols::Symbol).
16//!
17//! If a `Symbol` or a [Query](docs::Query) (which is a path of symbols to access an inner object)
18//! is used several times, it can be compiled and then re-used, which further improves performance.
19//!
20//! For lists of objects a [Table](table::Table) can be used to further speed up querying objects
21//! and data from it. A table permits to add indices for common queries and also keeps a cache
22//! for both, pre-compiled queries and frequent lookups around.
23//!
24//! To create a `Doc` a [DocBuilder](builder::DocBuilder) can be used to create a `Doc`
25//! programmatically. Also, [hash_to_doc](yaml::hash_to_doc) or [list_to_doc](yaml::list_to_doc)
26//! can be used to load data from `Yaml`.
27//!
28//! # Performance
29//! As stated above, `infograph` is built to serve queries against large in-memory data structures
30//! which are read frequently and changed seldomly.
31//!
32//! To guarantee optimal performance and to also optimize memory consumption, three main techniques
33//! are applied:
34//!
35//! 1) **Small string optimizations**: Strings which are rather short, are stored in place instead
36//! of using a classic `String`. Note that if the value doesn't fit in place, we still only store
37//! a pointer to a `[u8]` instead of a whole `Vec<u8>` which saves 8 bytes for the skipped
38//! capacity field (which isn't required as the strings are immutable anyway).
39//!
40//! 2) **Compression via SymbolTable**: As the keys in objects are most probably repeated very often
41//! we store the strings in a symbol table and only carry an `i32` along.Think if a list of 1000
42//! object which all look like `{ Foo: 42, Bar: 24 }`. Instead of storing the strings repeatedly,
43//! we have a symbol table: `{'Foo': 1, 'Bar': 2}` accompanied with a lookup table
44//! `[ 'Foo', 'Bar' ]`. The object itself is stored as `{1: 42, 2: 24}`.
45//!
46//! 3) **Optimized object representation**: Objects are maps which map Symbols (`i32`) to
47//! [nodes](node::Node). As many objects only contain a few entries, using a `HashMap` would be
48//! quite an overkill (and slow). Therefore we use a `Vec<(Symbol, Node)>` with a small initial
49//! capacity. This provides quite a compact memory layout which is very fast due to its efficient
50//! usage of L1 cache. We also sort entries by their `Symbol` so that we can use a binary search
51//! when looking up an entry.
52
53mod node;
54mod strings;
55
56pub mod builder;
57pub mod docs;
58pub mod symbols;
59pub mod table;
60pub mod yaml;