Module jupiter::idb

Expand description

Provides an in-memory database which can be used for ultra fast lookups on static master data.

A lot of business related applications need lots of master data like to operate correctly. Most of this data is rather static and large enough to not be “just loaded into the app itself” but also small enough to remain in memory in a dedicated server.

Examples would be code lists and mappings along with translations like a list of all packaging units along with mappings for different standards / file formats or a list of customs declaration codes.

Basically these datasets are lists of documents as one knows from MongoDB or other “NoSQL” databases but IDB provides some distinct features for lookups, reverse lookup, searching or multi language handling.

Next to these datasets / code lists (which are internally referred to as “tables”) InfoGraphDB also supports “sets” of strings. These sets can be used to efficiently represents lists like “which codes are enabled for standard X” or “which table entries can be used in scenario Y”.

Managing data

The data stored in IDB is considered “static”. Therefore it is loaded once from a source and then cannot be modified anymore. A common source for data is the Repository which can use its Loaders to transform an input file into one or more tables in IDB. Of course, if the underlying file changes, the file will be re-read and the table will be replaced, IDB just doesn’t provide any direct way of manipulating the data.

The main idea is to use a common data source like a git repository or a bucket in an object store which contains the master data as Yaml, JSON, XML or other formats. Using the Repository commands, this data is then loaded into the database. This has the great benefit that all systems (development, staging, production, customer instances …) will be automatically update once a file is changed.

By default, sets are loaded via the idb-yaml-sets loader, which expects a YAML hash which maps one or more keys to lists of set entries like this:

my_set: [ "A", "B", "C" ]
some.other.set: [ "foo", "bar" ]

Of course, there is also a programmatic API to create or drop tables and sets form other sources.

Commands

IDB.LOOKUP: IDB.LOOKUP table search_path filter_value path1 path2 path3 Performs a lookup for the given filter value in the given search path (inner fields separated by “.”) within the given table. If a result is found, the values for path1..pathN are extracted and returned. If no path is given, the number of matches is returned. If multiple documents match, only the first one if returned. Note that if a path matches an inner object (which is especially true for “.”), the result will be wrapped as JSON. Note that IDB.LOOKUP is case sensitive by default. However, if a fulltext index is placed on the field being queried, a case insensitive lookup can be performed if the given filter_value is already lowercase. This might be used e.g. for reverse lookups to find a code for a given text in a certain (or any) language.
IDB.ILOOKUP: IDB.ILOOKUP table primary_lang fallback_lang search_path filter_value path1 Behaves just like IDB.LOOKUP. However, of one of the given extraction paths points to an inner map, we expect this to be a map of translation where we first try to find the value for the primary language and if none is found for the fallback language. Note that, if both languages fail to yield a value, we attempt to resolve a final fallback using xx as language code. If all these attempts fail, we output an empty string. Note that therefore it is not possible to return an inner map when using ILOOKUP which is used for anything other than translations. Note however, that extracting single values using a proper path still works. See IDB.LOOKUP for details when this is case-sensitive and when it isn’t.
IDB.QUERY: IDB.QUERY table num_skip max_results search_path filter_value path1 Behaves just like lookup, but doesn’t just return the first result, but skips over the first num_skip results and then outputs up to max_result rows. Not that this is again limited to at most 1000. See IDB.LOOKUP for details when this is case-sensitive and when it isn’t.
IDB.IQUERY: IDB.QUERY table primary_lang fallback_lang num_skip max_results search_path filter_value path1 Provides essentially the same i18n lookups for IDB.QUERY as IDB.ILOOKUP does for IDB.LOOKUP. See IDB.LOOKUP for details when this is case-sensitive and when it isn’t.
IDB.SEARCH: IDB.SEARCH table num_skip max_results search_paths filter_value path1 Performs a search in all fields given as search_paths. This can either be comma separated like “path1,path2,path3” or a “*” to select all fields. Note that for a given search value, this will match case-insensitive and also for prefixes of a detected word within the document (the selected fields). Everything else behaves just like IDB.QUERY. Also note that a fulltext index has to be present for each field being queried.
IDB.ISEARCH: IDB.ISEARCH table primary_lang fallback_lang num_skip max_results search_paths filter_value path1 Adds i18n lookups for the generated results just like IDB.IQUERY or IDB.ILOOKUP.
IDB.SCAN: IDB.SCAN table num_skip max_results path1 path2 path3 Outputs all results by skipping over the first num_skip entries in the table and then outputting up to max_resultsrows.
IDB.ISCAN: IDB.ISCAN table primary_lang fallback_lang num_skip max_results path1 path2 path3 Again, behaves just like IDB.SCAN but provides i18n lookup for the given languages.
IDB.LEN: IDB.LEN reports the size of the given table.
IDB.SHOW_TABLES: IDB.SHOW_TABLES reports all tables and their usage statistics.
IDB.SHOW_SETS: IDB.SHOW_SETS reports all sets and their usage statistics.
IDB.CONTAINS: IDB.CONTAINS set key1 key2 key3 reports if the given keys are contained in the given set. For each key a 1 (contained) or a 0 (not contained) will be reported.
IDB.INDEX_OF: IDB.INDEX_OF set key1 key2 key3 reports the insertion index for each of the given keys using one-based indices.
IDB.CARDINALITY: IDB.CARDINALITY set reports the size of the given set.

Example

Imagine we have the following super simplified dataset representing some countries:

code: "D"
iso:
   two: "de"
   three: "deu"
name:
    de: "Deutschland"
    en: "Germany"
---
code: "A"
iso:
   two: "at"
   three: "aut"
name:
    de: "Österreich"
    en: "Austria"

Executing IDB.LOOKUP countries code D iso.two would yield “de”. We could also use IDB.ILOOKUP countries de en code D iso.two name to retrieve “de”, “Deutschland” or IDB.LOOKUP countries name.de Deutschland code for a reverse lookup yielding “D” again. Note that even IDB.LOOKUP countries name Deutschland code would work here, as we index all translations for a field.

Note that if both languages given in an IDB.ILOOKUP (or IDB.ISEARCH, IDB.ISCAN) don’t yield a value, we check if a final fallback value for the code xx is present. This might be usable if there is a default value and only one or a few languages differ.

Also note that the whole element being matched can be requested when using “.” as field path, e.g. IDB.LOOKUP countries code D .. A this matches an inner object, this will return the whole element wrapped as JSON string.

We could invoke IDB.ISEARCH countries en de 0 5 * deutsch code name to retrieve “D”, “Germany” e.g. to provide autocomplete values for a country field.

Modules

idb_csv_loader
Imports CSV files into InfoGraph DB.
idb_json_loader
Imports JSON files into InfoGraph DB.
idb_yaml_loader
Imports YAML files into InfoGraph DB.
idb_yaml_set_loader
Imports YAML files as sets into InfoGraphDB.
set
Represents a Set as used by InfoGraphDB.
table
A table wraps a crate::ig::docs::Doc together with an Trie as index to support InfoGraphDB queries.
trie
Provides a lookup table for arbitrary data structures using a TRIE.

Structs

Database
Describes the public API of the database.

Enums

DatabaseCommand
Describes the administrative commands which can be submitted via Database::perform.

Functions

install
Installs an actor which handles the commands as described above.