Kermit

Kermit is a library containing data structures, iterators and algorithms related to relational algebra, primarily for the purpose of research and benchmarking. It is currently in early stages of development and as such all builds and releases should be considered unstable.

It is being written primarily to provide a platform for my Masters thesis. The scope of which (preliminarily) encompassing benchmarking the Leapfrog Triejoin algorithm over a variety of data structures. I intend to design Kermit in an easily-extensible way, allowing for the possibility of benchmarking other algorithms and datastructures in the future.

Rust was chosen as the project language for two main reasons:

The Knowledge-Based Systems group at TU Dresden is developing a new Rust-based rule engine Nemo, which I'm hoping the knowledge and implementions developed during this Masters will prove useful for. I strongly recommend checking Nemo out. Not only is it a very promising project, it is one of most beautiful, pedantically managed repositories I've come across.
I wanted an excuse to write Rust with actual purpose.

My objective is to write entirely safe, stable, and hopefully idiomatic Rust the whole way through. I am very interested in how much one can maintain readibility (and sanity) while striving to achieve this.

Usage

Given a relation stored as a CSV file (edge.csv):

src,dst
1,2
2,3
3,4
1,3

And a Datalog query file (query.dl):

path(X, Y, Z) :- edge(X, Y), edge(Y, Z).

Run a join with the kermit CLI:

kermit join \
  --relations edge.csv \
  --query query.dl \
  --algorithm leapfrog-triejoin \
  --indexstructure tree-trie

Output (CSV to stdout):

1,2,3
1,3,4
2,3,4

Use --output results.csv to write to a file instead. Multiple relation files can be provided by repeating the --relations flag. Both tree-trie and column-trie index structures are supported.

Add --bench (or -b) to print timing statistics to stderr:

kermit join \
  --relations edge.csv \
  --query query.dl \
  --algorithm leapfrog-triejoin \
  --indexstructure tree-trie \
  --bench

--- join statistics ---
  data structure:  TreeTrie
  algorithm:       LeapfrogTriejoin
  relations:       1
  output tuples:   3
  load time:       0.000412s
  join time:       0.000076s
  write time:      0.000003s
  total time:      0.000521s

Benchmarking

Criterion micro-benchmarks

Run Criterion benchmarks for insertion, iteration, and space (heap size) across synthetic data sets:

cargo bench --package kermit-ds                       # all benchmarks
cargo bench --package kermit-ds --bench relation_benchmarks  # time only
cargo bench --package kermit-ds --bench space_benchmarks     # space only

CLI data structure benchmarks

Benchmark a specific data structure against a real data file:

kermit bench ds \
  --relation data.csv \
  --indexstructure tree-trie \
  --metrics insertion iteration space

Supported index structures: tree-trie, column-trie. Supported metrics: insertion, iteration, space.

Contributing

Thanks for taking an interest! Perhaps after I've finished my thesis.

License

This repository, as is customary with Rust projects, is duel-licensed under the MIT and Apache-V2 licenses.