Crate coppice

Source
Expand description

Coppice is a dynamic programming library for acyclic analytical queries expressed as nested map/reduce computations. These map/reduce computations are automatically cached and parallelised when executed with the map_reduce() higher-order function.

Of course, since we know we’re only interested in full blown map_reduce() results, we actually memoise fully aggregated results for each “tablet” (small set of rows) and opaque params.

In addition to these standard inputs (and cache keys), Coppice also passes “join keys” to the “map” functions. This third type of inputs (in addition to rows and opaque params) enables Coppice to offer asymptotically superior performance compared to pure memoisation: map_reduce() essentially executes “map” functions “in reverse,” for join keys.

The map_reduce() function thus ends up evaluating the “map” function for each row and params, but extracts all join keys that, combined with the row and params, yields a non-trivial Aggregate. We thus extract, for each row and params, a branching function from join keys to Aggregate, and reduce (merge::Merge) these branching functions together for all rows in a tablet.

Two key insights underlie Coppice.

The first is that backtracking search over join keys represented as bounded arrays of bits gives us enough finite domain powers to weakly emulate logic programming. That’s not a lot, but enough to automatically build a bit-trie index from an opaque function.

The second is that many analytical queries use only hierarchical joins (a well known fact), and that writing these queries as regular code implicitly gives us the tree necessary for Yannakakis’s algorithm (maybe less well known).

In short, the Coppice is just an API trick to coerce Rust covers into writing plain code that can be executed with a version of Yannakakis’s algorithm simplified for the hierarchical subset of acyclic queries.

Traits§

Aggregate
Coppice caches results from aggregate queries where the query results implement the Aggregate trait.
BaseJoinKey
Individual join keys (that are run “in reverse”) must be convertible to byte arrays (bit vectors really, but byte arrays are convenient).
JoinKeys
In practice, join keys are usually passed around as tuples or arrays, and must be convertible to a. The JoinKeys trait captures the containers of BaseJoinKeys we know how to convert to function inversion input.

Functions§

clear_all_caches
Clears all query caches for map_reduce calls in the current process.
map_reduce
The map_reduce generic function takes a list of tablets (fractional data sets), converts each to a parallel iterator of rows with the row_fn, and merges the result of executing worker with the params and join_keys on each row in all the tablets.