eqlog 0.1.5

Datalog with equality
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
# Eqlog

An extension of datalog with function symbols and equality.
Combines datalog with congruence closure.
Compiles to specialized rust code that easily integrates into rust projects.

## Example: Semilattices

Semilattices are partial orders (i.e. sets equipped with reflexive, transitive and antisymmetric relations) with binary meets.
They can be formalized in the following eqlog theory:
```rust
// semilattice.eqlog

// The carrier set.
Sort El;
// The less-equal relation.
Pred Le: El * El;

// Reflexivity.
Axiom x: El => Le(x, x);
// Transitivity.
Axiom Le(x, y) & Le(y, z) => Le(x, z);
// Antisymmetry.
Axiom Le(x, y) & Le(y, x) => x = y;

// A function assigning binary meets, i.e. binary greatest lower bound.
Func Meet: El * El -> El;

// The meet function is total, i.e. defined for all elements x, y.
Axiom x: El & y: El => Meet(x, y)!;
// The meet is a lower bound.
Axiom m = Meet(x, y) => Le(m, x) & Le(m, y);
// All lower bounds are smaller or equal to the meet.
Axiom Le(z, x) & Le(z, y) & m = Meet(x, y) => Le(z, m);
```

Eqlog translates this eqlog file to a rust module that allows computations on models of the semilattice theory, i.e. semilattices.
For example, we can verify that the meet function in a semilattice is associative:
```rust
// main.rs

use eqlog_runtime::eqlog_mod;
eqlog_mod!(semilattice);
use crate::semilattice::*;

fn main() {
    // Create an empty semilattice structure and add three elements to it.
    let mut sl = Semilattice::new();
    let x = sl.new_el();
    let y = sl.new_el();
    let z = sl.new_el();

    // Close the semilattice structure by matching premises of axioms and
    // adding conclusions until we reach a fixed point.
    sl.close();
    // sl satisfies all semilattice axioms now.

    // Evaluate the left-associated meet xy_z = (x /\ y) /\ z.
    let xy = sl.meet(x, y).unwrap();
    let xy_z = sl.meet(xy, z).unwrap();

    // Evaluate the right-associated meet x_yz = x /\ (y /\ z).
    let yz = sl.meet(y, z).unwrap();
    let x_yz = sl.meet(x, yz).unwrap();

    // Check that the two elements are equal.
    if sl.are_equal_el(xy_z, x_yz) {
        println!("Meet is associative.");
    } else {
        println!("Meet is not associative.");
    }
}
```

The [eqlog-test/src](https://github.com/eqlog/eqlog/tree/master/eqlog-test/src) contains more examples:
- type inference [[eqlog](https://github.com/eqlog/eqlog/tree/master/eqlog-test/src/inference.eqlog)] [[rust](https://github.com/eqlog/eqlog/tree/master/eqlog-test/src/inference_test.rs)]
- proving facts about categories [[eqlog](https://github.com/eqlog/eqlog/tree/master/eqlog-test/src/lex_category.eqlog)] [[rust](https://github.com/eqlog/eqlog/tree/master/eqlog-test/src/lex_category_test.rs)]

## Integration into rust projects

A full sample project can be found in [examples/semilattice](https://github.com/eqlog/eqlog/tree/master/examples/semilattice).
Eqlog consists of the compiler crate `eqlog`, which is only needed during build, and the runtime crate `eqlog-runtime`.
We can specify them as dependencies in `Cargo.toml` by adding the following lines:
```toml
[dependencies]
eqlog-runtime = "0"

[build-dependencies]
eqlog = "0"
```

In order for our rust code to integrate with the code generated by eqlog, we need to run eqlog before building the crate itself.
This can be accomplished by adding the following `build.rs` file next to `Cargo.toml` in the package root directory:
```rust
fn main() {
    eqlog::process_root();
}
```
Cargo automatically executes the [`build.rs` file](https://doc.rust-lang.org/cargo/reference/build-scripts.html) before building.
`eqlog::process_root()` searches for files under the `src` directory and generates a rust module for each `.eqlog` file.
To declare the rust module generated from an eqlog file, we use the `eqlog_mod!` macro:
```rust
use eqlog_runtime::eqlog_mod;
eqlog_mod!(<filename without extension>);
```
Note that a special invocation that specifies the full path is needed for eqlog files in nested subdirectories of `src`.

## Language

Each eqlog file consists of a sequence of sort, predicate, function and axiom declarations.
Mathematically, eqlog is a way to specify [essentially algebraic theories](https://ncatlab.org/nlab/show/essentially+algebraic+theory#definition).

### Sorts
Sorts represent the different carrier sets of models of our theory.
They are declared as follows:
```rust
Sort <SortName>;
```
The name of a sort must be `UpperCamelCase`.

### Predicates
Predicates are declared as follows:
```rust
Pred <PredName> : <Sort_1> * ... * <Sort_n>;
```
Each predicate has an arity, which is the list of sorts separated by an asterisk appearing after the colon.
All sorts appearing in the arity must be declared prior to the predicate.
Predicate names must be `UpperCamelCase`.

### Functions
Functions are declared as follows:
```rust
Func <FuncName> : <ArgSort_1> * ... * <ArgSort_n> -> <ResultSort>;
```
Each function has a domain, which is the list of sorts appearing before the arrow, and a codomain sort after the arrow.
All sorts appearing in the domain and the codomain must be declared prior to the function.
Function names must be `UpperCamelCase`.

A function with empty domain is a constant.
Constants are declared as follows:
```rust
Func <ConstantName> : <Sort>;
```

To eqlog, functions are synonymous to *partial* functions; they need not be total.

### Axioms
The simplest but most general type of axiom is the *implication*.
Implications are of the form
```rust
Axiom <Premise> => <Conclusion>;
```
where `<Premise>` and `<Conclusion>` are conjunctions of *atoms*.

Most atoms contain *terms*.
A term is either a variable or an application of a function symbol to terms.
Variables names must be `lower_snake_case`.
Variables that are used only once should be replaced with a wildcard `_`.

Eqlog supports the following atoms:
* Atoms of the form `<PredName>(<arg_1>, ..., <arg_n>)`.
  Such atoms assert that `<arg_1>, ..., <arg_n>` must satisfy the `<PredName>` predicate.
* Atoms of the form `<term>!`.
  Note the exclamation mark.
  Such atoms assert that `<term>` is defined, i.e. that the involved functions are defined on their arguments.
* Atoms of the form `<tm_1> = <tm_2>`.
  Such atoms assert that the terms `<tm_1>` and `<tm_2>` are both defined and equal.
* Atoms of the form `<var_name> : <SortName>`.
  Such atoms assert that `<var_name>` is a variable of type `<SortName>`.
  They can only appear in premises.

Every variable occuring in an implication must be used at least once in the premise.
Thus no additional variables may be introduced in the conclusion.
Furthermore, unless the exclamation mark operator `!` is used, implications must be *surjective*:
Every term appearing in the conclusion of an implication must be equal to a term appearing in the premise or earlier in the conclusion.
The only way to circumvent this restriction is to add an atom of the form `<tm>!` in the conclusion.
Later atoms can then freely use `<tm>`.

For example, consider the following invalid and valid axioms for the semilattice theory above:
```rust
// Invalid: Cannot infer sort of x.
Axiom x = x => x = x;
// Valid (but nonsensical):
Axiom x: El => x = x;

// Invalid: x and y are not used in the (empty) premise.
Axiom Meet(x, y)!;
// Valid:
Axiom x: El & y: El => Meet(x, y)!;

// Invalid: Meet(x, y) is not equal to a term occuring earlier.
Axiom Le(z, x) & Le(z, y) => Le(z, Meet(x, y));
// Valid: Assert that Meet(x, y) exists as part of conclusion.
Axiom Le(z, x) & Le(z, y) => Meet(x, y)! & Le(z, Meet(x, y));
// Valid: Suppose that Meet(x, y) exists in premise.
Axiom Le(z, x) & Le(z, y) & Meet(x, y)! => Le(z, Meet(x, y));

// Invalid: Both Meet(x, y) and Meet(y, x) do not occur earlier.
Axiom x: El & y: El => Meet(x, y) = Meet(y, x);
// Valid: the term on the left-hand side of the equation is introduced
// in the premise.
Axiom Meet(x, y)! => Meet(x, y) = Meet(y, x);
// Valid: the term on the right-hand side of the equation is introduced
// earlier in the conclusion.
Axiom x: El & y : El => Meet(x, y)! & Meet(y, x) = Meet(x, y);

// Invalid: Meet(x, y) is not equal to a term that occurs earlier.
Axiom u = Meet(x, Meet(y, z))! => Meet(Meet(x, y), z) = u;
// Valid: All of u, Meet(x, y) and z are introduced in the premise.
Axiom u = Meet(x, Meet(y, z))! & Meet(x, y)! => u = Meet(Meet(x, y), z);
```

#### Reductions
Reductions are syntactic sugar for implication axioms.
A reduction has the form
```rust
Axiom <from> ~> <to>;
```
where `<from>` and `<to>` are terms of the same sort and `<from>` must not be a variable.
A reduction axiom has the following meaning:
*If all subterms of `<from>` are defined and `<to>` is defined, then also `<from>` is defined and equals `<to>`.*
Accordingly, if `<from> = <Func>(<arg_1>, ..., <arg_n>)`, then the reduction desugars to the implication
```rust
Axiom <arg_1>! & ... & <arg_n>! & <to>! => <from> = <to>;
```
The order of the `from` and `to` terms can be confusing at first, but consider that algorithms involving reduction usually work top-down, whereas eqlog evaluates bottom-up.

Eqlog also supports the following symmetric form
```rust
Axiom <lhs> <~> <rhs>;
```
which desugars to the two reductions
```rust
Axiom <lhs> ~> <rhs>;
Axiom <rhs> ~> <lhs>;
```

Both reductions and symmetric reductions can be made conditional on a premise:
```rust
Axiom <atom_1> & ... & <atom_n> => <lhs> ~> <rhs>;
Axiom <atom_1> & ... & <atom_n> => <lhs> <~> <rhs>;
```

## Generated rust interfaces
Eqlog translates each `.eqlog` to an `.rs` file.
The rust file must be declared inside a module of the `src` directory (e.g. `lib.rs` or `main.rs`) using the `eqlog_runtime::eqlog_mod!` macro.

Eqlog generates documented rust code.
To build and view this documentation, run
```sh
cargo doc --document-private-items --open
```

The public API of the generated rust module consists of the following symbols:
* For each sort, a type `<SortName>` of element ids for this sort.
* The model structure.
  Its name is derived by converting the file name to upper camel case.
  For example, for `semi_group.eqlog`, this would be `SemiGroup`.

The model structure has the following member functions:
* `fn new() -> Self`  
  Creates an empty model structure.
* `fn close(&mut self)`  
  Close the model under all axioms.
* `pub fn close_until(&mut self, condition: impl Fn(&Self) -> bool) -> bool`  
  Close the model under all axioms until a condition is satisfied.
  Returns false if the model could be closed under all axioms but the condition still does not hold.
* For each sort:
  - `fn new_<sort_name>(&mut self) -> <SortName>`  
    Adjoins a new element to the model structure.
  - `fn equate_<sort_name>(&mut self, lhs: <SortName>, rhs: <SortName>)`  
    Enforces the equality `lhs = rhs` in the model structure.
  - `fn are_equal_<sort_name>(&self, lhs: <SortName>, rhs: <SortName>) -> bool`  
    Returns true if `lhs` and `rhs` represent the same element.
  - `fn root_<sort_name>(&self, el: <SortName>) -> <SortName>`    
    Returns the canonical/root element of the equivalence class of an element.
* For each predicate:
  - `fn <pred_name>(&self, arg_1: <Sort_1>, ..., arg_n: <Sort_n>)`  
    Checks whether the predicate holds on the given arguments.
  - `fn iter_<pred_name>(&self) -> impl '_ + Iterator<Item = <arity>>`  
    Returns an iterator over tuples satisfying the predicate.
    The item type yielded by the iterator is a tuple type determined by the arity.
  - `fn insert_<pred_name>(&mut self, arg_1: <Sort_1>, ..., arg_n: <Sort_n>)`  
    Enforces that the predicate holds for the given arguments.
* For each function:
  - `fn <func_name>(&self, arg_1: <Sort_1>, ..., arg_n: <Sort_n>) -> Option<<ResultSort>>`  
    Evaluates the function on the given arguments.
  - `fn iter_<func_name>(&self) -> impl '_ + Iterator<Item = <arity>>`  
    Returns an iterator over tuples in the graph of the function.
    The item type yielded by the iterator is a tuple type determined by the arity.
  - `fn define_<func_name>(&mut self, arg_1: <Sort_1>, ..., arg_n: <Sort_n>) -> <ResultSort>`  
    Enforces that the function is defined on the given arguments, adjoining a new element if necessary.
  - `fn insert_<func_name>(&mut self, arg_1: <Sort_1>, ..., arg_n: <Sort_n>, result: <ResultSort>)`  
    Insert a tuple into the graph of the function.

For example, these are the public declarations of the rust module generated from the semilattice theory:
```rust
struct El;
struct Semilattice;
impl Semilattice {
  fn new() -> Self;
  fn close(&mut self);
  fn close_until(&mut self, condition: impl Fn(&Self) -> bool) -> bool;

  fn new_el(&mut self) -> El;
  fn equate_el(&mut self, lhs: El, rhs: El);
  fn are_equal_el(&self, lhs: El, rhs: El) -> bool;
  fn root_el(&self, el: El) -> El;

  fn le(&self, arg0: El, arg1: El) -> bool;
  fn iter_le(&self) -> impl '_ + Iterator<Item = (El, El)>;
  fn insert_le(&mut self, arg0: El, arg1: El);

  fn meet(&self, arg0: El, arg1: El) -> Option<El>;
  fn iter_meet(&self) -> impl '_ + Iterator<Item = (El, El, El)>;
  fn define_meet(&mut self, arg0: El, arg1: El) -> El;
  fn insert_meet(&mut self, arg0: El, arg1: El, result: El);
}
```

## Data model and algorithms

### Standard datalog features
The theory model structures generated by eqlog can be thought of as in-memory SQL databases, with schema given by sort, predicate and function declarations.
Elements of a given sort are simple integer ids, and the model structure maintains a list of valid ids for each sort.
Every predicate `P` is represented as a table whose rows are the tuples of elements for which the predicate holds.
Functions are represented as [graphs](https://en.wikipedia.org/wiki/Graph_of_a_function).
For example, if F is a function in one argument, then the internal table for F will consist of rows of the form `(x, F(x))`.

The `close` function repeatedly enumerates all instances of premises of axioms and adjoins their conclusion to the model.
Eventually, a [fixed point](https://en.wikipedia.org/wiki/Fixed_point_(mathematics)) is reached (unless the theory contains non-surjective axioms, see below) and the algorithm stops.
For example, for the transitivity axiom
```rust
Axiom Le(x, y) & Le(y, z) => Le(x, z);
```
the the close function enumerates all rows `(x, y_0)` and `(y_1, z)` in the `Le` table such that `y_0 = y_1`, and then adds the row `(x, z)` to `Le`.
Eventually, the `Le` table will already contain all the new rows `(x, z)` we find, which means that we have reached the fixed point and can stop:
The `Le` predicate is transitive now.

The enumeration of instances of premises is known as [(inner) join](https://en.wikipedia.org/wiki/Join_(SQL)#Inner_join) in SQL terminology.
SQL databases speed up inner joins using indices, and eqlog automatically selects and maintains indices to speed up the joins required to enumerate the axioms listed in the eqlog file.
In each iteration, it is not necessary to enumerate premises that were already enumerated in the previous iteration.
This optimization is known as *semi-naive evaluation*, and is again something that eqlog uses to speed up the `close` function.

### Equalities
In addition to the standard datalog features discussed so far, eqlog supports equalities in conclusions.
One example is the antisymmetry axiom of partial orders:
```rust
Axiom Le(x, y) & Le(y, x) => x = y;
```
Another source of equality in conclusions are the implicit functionality axioms for functions:
For functions `F`, if we find both `(x_1, ..., x_n, y)` and `(x1, , ..., x_n, z)` in the graph of `F`, then we must derive `y = z`.
If we denote the graph of `F` by `G_F`, then the implicit functionality axiom can be stated as follows:
```rust
Axiom G_F(x_1, ..., x_n, y) & G_F(x_1, ..., x_n, z) => y = z
```
Note, however, that graphs of functions cannot be referred to directly in eqlog files.

To account for derived equalities, eqlog model structures maintain [union-find data structures](https://en.wikipedia.org/wiki/Disjoint-set_data_structure) for each sort.
When an equality `x = y` is derived during a call to `close`, eqlog merges the equivalence classes of `x` and `y`.

To speed up computation of the joins required when matching axiom premises, eqlog maintains indices for each table.
However, these indices can only be used if element ids can be compared for equality directly instead of consulting a union find data structure.
Eqlog thus maintains the invariant that all predicate and function graph tables contain canonical sort elements only, i.e. only elements whose ids are root nodes with respect to the corresponding union find data structure.

This invariant is temporarily violated when eqlog merges the equivalence class of some element `x` into that of an element `y`.
To restore the invariant, eqlog removes all rows from tables that contain `x`, replaces `x` by the new root id `y`, and inserts the rows again.
To speed up this process, eqlog maintains a list for each root id that contains all rows in which the root id appears.

### Non-surjective axioms and non-termination
Recall that eqlog axioms need to be surjective unless the exclamation mark operator `!` is used:
Every element in the conclusion must be equal to some element in the premise.
Closing model structures under surjective axioms does not increase the number of elements in the model, which guarantees that the `close` function eventually terminates.

If there are non-surjective axioms, then this is not guaranteed.
Consider the following eqlog theory that formalizes natural numbers:
```
Sort N;
Zero: N;
Succ: N -> N;

Axiom Zero()!;
Axiom n: N => Succ(n)!;
```
Closing the empty model structure will first add an element `Zero` to the model, then the element `Succ(N)`, then `Succ(Succ(N))` and so forth.
However, the presence of non-surjective axioms does not necessarily mean that the close function must run indefinitely.
For example, the semilattice theory contains the non-surjective axiom
```rust
Axiom x: El & y : El => Meet(x, y)!;
```
but `close` nevertheless terminates.

If a theory contains non-surjective axioms, the generated `close` function will consist of nested close loops:
The outer loop is responsible for non-surjective axioms, and the inner loop is responsible for surjective axioms.
In pseudo-code, the `close` function looks a bit like this:
```rust
// Match axiom premise and change the model such that the conclusion holds.
// Returns true if the model was changed, i.e. if not all conclusions were
// already true.
fn adjoin_conclusions(ax: Axiom) -> bool {
  ...
}

fn close() {
  loop {
    loop {
      let model_changed = surjective_axioms.iter().any(adjoin_conclusion);
      if !model_changed {
        break;
      }
    }

    let model_changed = non_surjective_axioms.iter().any(adjoin_conclusion);
    if !model_changed {
      break;
    }
  }
}
```

## Background

For a more thorough explanation of how Eqlog works, have a look at these papers:

- *An Evaluation Algorithm for Datalog with Equality* [[Web](https://www.mbid.me/eqlog-algorithm)] [[PDF](https://arxiv.org/pdf/2302.05792.pdf)]
- *Algebraic Semantics of Datalog with Equality* [[Web](https://www.mbid.me/eqlog-semantics)] [[PDF](https://arxiv.org/pdf/2302.03167.pdf)]

Also have a look at the [egglog project](https://github.com/egraphs-good/egglog) and the corresponding [paper](https://arxiv.org/pdf/2304.04332.pdf).
Even though its surface language looks a bit different, Egglog is based on very similar ideas as those underlying Eqlog, and can be used for many of the same applications.