tpctools 0.1.2

Utilities for generating TPC-H and TPC-DS data sets

TPC Tools

Command-line tools for generating TPC-H and TPC-DS data sets in parallel and re-organizing the output files into directory structures that can be consumed by tools such as Apache Spark or Apache Arrow DataFusion/Ballista.


Install dependencies.

sudo apt install gcc make flex bison byacc git
git clone

Generate data.

cargo run --release -- generate --benchmark tpcds \
  --scale 1000 \
  --partitions 48 \
  --generator-path ~/git/tpcds-kit/tools/ \
  --output /tmp/tpcds-sf1000/

Example output.

Generated TPC-DS data at scale factor 1000 with 48 partitions in: 6247.155671938s


Install dependencies.

git clone
cd tpch-dbgen

Generate data.

`cargo run --release -- generate --benchmark tpch \
  --scale 100 \
  --partitions 24 \
  --generator-path ~/git/tpch-dbgen/ \
  --output /tmp/tpch-sf100/`