[−][src]Module jupiter::infograph
Provides a compact and highly efficient representation of in-memory databases.
infograph
permits to generate or load structured data, which can then be queried in various
ways. This could be anything from simple code lists (name value pair) to complex graphs of
master data like nested lookup tables translated into several languages.
The work horse is the Doc struct which contains the root Node and the SymbolTable. Nodes as the internal representation of the supported data types (strings, ints, bools, lists and maps). The external interface used to access them is Element which carries the symbol table along.
Symbols are simply numbers (i32
) which are managed by a SymbolTable.
The reason is that many documents contain lots of inner objects (maps) which all share the same
keys. To optimize memory consumption and also to improve access times, these keys are only
put in them symbol table once and then referenced as Symbol.
If a Symbol
or a Query (which is a path of symbols to access an inner object)
is used several times, it can be compiled and then re-used, which further improves performance.
For lists of objects a Table can be used to further speed up querying objects and data from it. A table permits to add indices for common queries and also keeps a cache for both, pre-compiled queries and frequent lookups around.
To create a Doc
a DocBuilder can be used to create a Doc
programmatically. Also, hash_to_doc or list_to_doc
can be used to load data from Yaml
.
Performance
As stated above, infograph
is built to serve queries against large in-memory data structures
which are read frequently and changed seldomly.
To guarantee optimal performance and to also optimize memory consumption, three main techniques are applied:
-
Small string optimizations: Strings which are rather short, are stored in place instead of using a classic
String
. Note that if the value doesn't fit in place, we still only store a pointer to a[u8]
instead of a wholeVec<u8>
which saves 8 bytes for the skipped capacity field (which isn't required as the strings are immutable anyway). -
Compression via SymbolTable: As the keys in objects are most probably repeated very often we store the strings in a symbol table and only carry an
i32
along.Think if a list of 1000 object which all look like{ Foo: 42, Bar: 24 }
. Instead of storing the strings repeatedly, we have a symbol table:{'Foo': 1, 'Bar': 2}
accompanied with a lookup table[ 'Foo', 'Bar' ]
. The object itself is stored as{1: 42, 2: 24}
. -
Optimized object representation: Objects are maps which map Symbols (
i32
) to nodes. As many objects only contain a few entries, using aHashMap
would be quite an overkill (and slow). Therefore we use aVec<(Symbol, Node)>
with a small initial capacity. This provides quite a compact memory layout which is very fast due to its efficient usage of L1 cache. We also sort entries by theirSymbol
so that we can use a binary search when looking up an entry.
Modules
builder | Provides |
docs | Contains the core elements to read / query an information graph. |
symbols | Provides a |
table | |
yaml |