Expand description
Coppice is a dynamic programming library for acyclic analytical
queries expressed as nested map/reduce computations. These
map/reduce computations are automatically cached and parallelised
when executed with the map_reduce() higher-order function.
Of course, since we know we’re only interested in full blown
map_reduce() results, we actually memoise fully aggregated
results for each “tablet” (small set of rows) and opaque params.
In addition to these standard inputs (and cache keys), Coppice
also passes “join keys” to the “map” functions. This third type
of inputs (in addition to rows and opaque params) enables
Coppice to offer asymptotically superior performance compared to
pure memoisation: map_reduce() essentially executes “map”
functions “in reverse,” for join keys.
The map_reduce() function thus ends up evaluating the “map”
function for each row and params, but extracts all join keys
that, combined with the row and params, yields a non-trivial
Aggregate. We thus extract, for each row and params,
a branching function from join keys to Aggregate, and reduce
(merge::Merge) these branching functions together for
all rows in a tablet.
Two key insights underlie Coppice.
The first is that backtracking search over join keys represented as bounded arrays of bits gives us enough finite domain powers to weakly emulate logic programming. That’s not a lot, but enough to automatically build a bit-trie index from an opaque function.
The second is that many analytical queries use only hierarchical joins (a well known fact), and that writing these queries as regular code implicitly gives us the tree necessary for Yannakakis’s algorithm (maybe less well known).
In short, the Coppice is just an API trick to coerce Rust covers into writing plain code that can be executed with a version of Yannakakis’s algorithm simplified for the hierarchical subset of acyclic queries.
Traits§
- Aggregate
- Coppice caches results from aggregate queries where the query results
implement the
Aggregatetrait. - Base
Join Key - Individual join keys (that are run “in reverse”) must be convertible to byte arrays (bit vectors really, but byte arrays are convenient).
- Join
Keys - In practice, join keys are usually passed around as tuples or arrays, and
must be convertible to a. The
JoinKeystrait captures the containers ofBaseJoinKeys we know how to convert to function inversion input.
Functions§
- clear_
all_ caches - Clears all query caches for
map_reducecalls in the current process. - map_
reduce - The
map_reducegeneric function takes a list of tablets (fractional data sets), converts each to a parallel iterator of rows with therow_fn, and merges the result of executingworkerwith theparamsandjoin_keyson eachrowin all thetablets.