1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
//! Provides a compact and highly efficient representation of in-memory databases.
//!
//! `infograph` permits to generate or load structured data, which can then be queried in various
//! ways. This could be anything from simple code lists (name value pair) to complex graphs of
//! master data like nested lookup tables translated into several languages.
//!
//! The work horse is the [Doc](docs::Doc) struct which contains the root **Node** and
//! the [SymbolTable](symbols::SymbolTable). Nodes as the internal representation of the supported
//! data types (strings, ints, bools, lists and maps). The external interface used to access them
//! is [Element](docs::Element) which carries the symbol table along.
//!
//! Symbols are simply numbers (`i32`) which are managed by a [SymbolTable](symbols::SymbolTable).
//! The reason is that many documents contain lots of inner objects (maps) which all share the same
//! keys. To optimize memory consumption and also to improve access times, these keys are only
//! put in them symbol table once and then referenced as [Symbol](symbols::Symbol).
//!
//! If a `Symbol` or a [Query](docs::Query) (which is a path of symbols to access an inner object)
//! is used several times, it can be compiled and then re-used, which further improves performance.
//!
//! For lists of objects a [Table](crate::idb::table::Table) can be used to further speed up querying objects
//! and data from it. A table permits to add indices for common queries and also keeps a cache
//! for both, pre-compiled queries and frequent lookups around.
//!
//! To create a **Doc** programmatically, a [DocBuilder](builder::DocBuilder) can be used.
//!
//! Also there are helpers to read common data formats:
//! * **YAML**: [hash_to_doc](yaml::hash_to_doc) or [list_to_doc](yaml::list_to_doc)
//! * **JSON**: [object_to_doc](json::object_to_doc) or [list_to_doc](json::list_to_doc)
//! * **XML**: [PullReader](xml::PullReader)
//!
//! # Performance
//! As stated above, `infograph` is built to serve queries against large in-memory data structures
//! which are read frequently and changed seldom.
//!
//! To guarantee optimal performance and to also optimize memory consumption, three main techniques
//! are applied:
//!
//! 1) **Small string optimizations**: Strings which are rather short, are stored in place instead
//! of using a classic `String`. Note that if the value doesn't fit in place, we still only store
//! a pointer to a `[u8]` instead of a whole `Vec<u8>` which saves 8 bytes for the skipped
//! capacity field (which isn't required as the strings are immutable anyway).
//!
//! 2) **Compression via SymbolTable**: As the keys in objects are most probably repeated very often
//! we store the strings in a symbol table and only carry an `i32` along.Think if a list of 1000
//! object which all look like `{ Foo: 42, Bar: 24 }`. Instead of storing the strings repeatedly,
//! we have a symbol table: `{'Foo': 1, 'Bar': 2}` accompanied with a lookup table
//! `[ 'Foo', 'Bar' ]`. The object itself is stored as `{1: 42, 2: 24}`.
//!
//! 3) **Optimized object representation**: Objects are maps which map Symbols (`i32`) to
//! **nodes**. As many objects only contain a few entries, using a `HashMap` would be
//! quite an overkill (and slow). Therefore we use a `Vec<(Symbol, Node)>` with a small initial
//! capacity. This provides quite a compact memory layout which is very fast due to its efficient
//! usage of L1 cache. We also sort entries by their `Symbol` so that we can use a binary search
//! when looking up an entry.
//!
//! # Data Layout
//!
//! Using **loaders** (defined in the [Repository](Repository)) one can control for which columns
//! and index is created. These lookup indices drastically improve the performance of lookups
//! and searches. We support exact indices (given as **indices**) and search indices (specified
//! as **fulltextIndices**). The former only store full field values, where the latter also store
//! tokens (e.g. "hello", "world", for "hello world").
//!
//! Note that if a n index is used for a list or a map, all its child values are also mapped. Note
//! that also, we create a sub-index - e.g. when this is indexed:
//! ```yaml
//! mappings:
//!     acme: "test"
//! ```
//!
//! Using an index for "mappings", we'd index "text" and also create an index for mappings.acme with
//! the same value.
mod node;
mod strings;

pub mod builder;
pub mod csv;
pub mod docs;
pub mod json;
pub mod symbols;
pub mod xml;
pub mod yaml;