risinglight 0.2.0

An OLAP database system for educational purpose
Documentation
# Running TPC-H Queries

Currently, RisingLight supports importing TPC-H data and run a subset of TPC-H queries. You may import the TPC-H data
and run some simple queries in RisingLight.

## Generate TPC-H Data

### Use Make Recipe

You may use the make recipe to download and generate TPC-H data (about 1GB in tbl format).

```
make tpch
```

If you want to run a larger query:

```
make tpch-10gb
```

The generated data will be placed under `tpch-dbgen/tbl` folder.

### Manual Generation

First, you should use git to clone the tpch-dbgen repo:

```
git clone https://github.com/electrum/tpch-dbgen.git
```

This repo contains the program for generating TPC-H data. 

Then, enter the tpch-dbgen directory and type `make all`, and it will generate some executable binaries such as `dbgen` and `qgen`. We will show you how to generate TPC-H data by using one line of command in the following sections. Meanwhile, you can read this [README](https://github.com/electrum/tpch-dbgen/blob/master/README) for more details.

Finally, type the following command and wait for several seconds:

```
./dbgen
```

This command will generate the data we want, which contains a table called `LINEITEM` with a size of 700MB.

## Create Table and Import Data

You will need to build RisingLight in release mode, so as to import data faster.

```shell
cargo build --release
```

Then, use our test scripts to create tables.

```shell
cargo run --release -- -f tests/sql/tpch/create.sql
```

We can use `\dt` to ensure that all tables have been imported.

```
cargo run --release
# Inside SQL shell
\dt
+---+----------+---+----------+---+----------+
| 0 | postgres | 0 | postgres | 3 | supplier |
| 0 | postgres | 0 | postgres | 1 | region   |
| 0 | postgres | 0 | postgres | 0 | nation   |
| 0 | postgres | 0 | postgres | 4 | partsupp |
| 0 | postgres | 0 | postgres | 7 | lineitem |
| 0 | postgres | 0 | postgres | 6 | orders   |
| 0 | postgres | 0 | postgres | 2 | part     |
| 0 | postgres | 0 | postgres | 5 | customer |
+---+----------+---+----------+---+----------+
```

Then, we may use the `import.sql` to import data, which calls `COPY FROM` SQL statements internally:

```shell
cargo run --release -- -f tests/sql/tpch/import.sql
```

Generally, you can finish this process within several seconds.

## Run TPC-H

Now, we can run simple queries on this table.

```shell
cargo run --release
```

```sql
select sum(L_LINENUMBER) from LINEITEM;
select count(L_ORDERKEY), sum(L_LINENUMBER) from LINEITEM where L_ORDERKEY > 2135527;
```

Or run real TPC-H queries:

```shell
cargo run --release -- -f tests/sql/tpch/q1.sql
cargo run --release -- -f tests/sql/tpch/q3.sql
cargo run --release -- -f tests/sql/tpch/q5.sql
cargo run --release -- -f tests/sql/tpch/q6.sql
cargo run --release -- -f tests/sql/tpch/q10.sql
```

## Clean Data

All data of RisingLight is stored in `risinglight.secondary.db` folder. Simply remove it
if you want to clean all data.

## Developers: Add new TPC-H tests

In `tests`, we have two kinds of TPC-H tests:

* Run TPC-H query on small dataset (used for unit tests). `tpch` folder.
* Run TPC-H query on ~1GB dataset generated by tpch-gen. `tpch-full` folder.

Everytime we add a new TPC-H query, we should add both kinds of tests, in sqllogictest format. For example,
developers should:

* create `tests/sql/tpch/_qXX.slt`
* create `tests/sql/tpch-full/_qXX.slt`
* create `tests/sql/tpch/qXX.sql`
* add `include _qXX.slt` in `tests/sql/tpch/tpch.slt`
* add `include _qXX.slt` in `tests/sql/tpch-full/tpch.slt`

By using `output-format` parameter, we can easily get the output format required by sqllogictest:

```
cargo run --release -- -f tests/sql/tpch/q5.sql --output-format text
psql -d tpch -f tests/sql/tpch/q5.sql -A -t -F " "
```

... which yields

```
ALGERIA 55756674.2813
MOZAMBIQUE 54883960.1257
MOROCCO 50463646.0237
ETHIOPIA 49934541.2268
KENYA 48858086.8222
```

Developers should check the output against Postgres in order to ensure the output is correct. The full 1GB TPC-H
test suite can be verified by running:

```
cargo run --release -- -f tests/sql/tpch-full/_tpch_full.slt --output-format text
```

And if you want to start fresh from empty database, you may run:

```
./tests/tpch-full.sh
```