chewdata 1.7.0

Extract Transform and Load data
Documentation

Chewdata

Linter CI Coverage semantic-release

This application is an simple ETL in rust that can be use as a connector between systems

  • It handle multiple formats : Json, Jsonl, CSV, Toml, XML, Yaml, Text
  • It can read/write data from :
    • Mongodb database
    • S3/Minio with versionning & select
    • Http(s) APIs with with some authenicators: Basic, Bearer, Jwt
    • Local
    • Relational DB like PSQL (Not Yet)
    • Message broker (Not Yet)
  • It need only rustup
  • No garbage collector
  • Parallel work
  • Multi platforms
  • Use tera template in order to configure the actions for the data transformation

the target of this project is to simplify the work of developers and simplify the connection between system. The work is not finished but I hope it will be useful for you.

Getting started

Setup from source code

Requirement:

Commands to execute:

git clone https://github.com/jmfiaschi/chewdata.git chewdata
cd chewdata
cp .env.dev .env
vim .env // Edit the .env file
make build
make unit-tests
make integration-tests

If all the test pass, the project is ready. read the Makefile in order to see, what kind of shortcut you can use.

If you want some examples to discover this project, go in this section ./examples

Run the ETL

If you run the program without parameters, the application will wait until you write json data. By default, the program write json data in the output and the program stop when you press multiple times the 'enter' key.

$ cargo run 
$ [{"key":"value"},{"name":"test"}]
$ exit
[{"key":"value"},{"name":"test"}]

Another example without etl configuration and with file in input

$ cat ./data/multi_lines.json | cargo run 
[{...}]

or

$ cat ./data/multi_lines.json | make run 
[{...}]

Another example, With a json etl configuration in argument

$ cat ./data/multi_lines.csv | cargo run '[{"type":"reader","document":{"type":"csv"}},{"type":"writer"}]'
[{...}] // Will transform the csv data into json format

or

$ cat ./data/multi_lines.csv | make run json='[{\"type\":\"reader\",\"document\":{\"type\":\"csv\"}},{\"type\":\"writer\"}]'
[{...}] // Will transform the csv data into json format

Another example, With etl file configuration in argument

$ echo '[{"type":"reader","connector":{"type":"io"},"document":{"type":"csv"}},{"type":"writer"}]' > my_etl.conf.json
$ cat ./data/multi_lines.csv | cargo run -- --file my_etl.conf.json
[{...}]

or

$ echo '[{"type":"reader","connector":{"type":"io"},"document":{"type":"csv"}},{"type":"writer"}]' > my_etl.conf.json
$ cat ./data/multi_lines.csv | make run file=my_etl.conf.json
[{...}]

It is possible to use alias and default value to decrease the configuration length

$ echo '[{"type":"r","doc":{"type":"csv"}},{"type":"w"}]' > my_etl.conf.json
$ cat ./data/multi_lines.csv | make run file=my_etl.conf.json
[{...}]

How to contribute

In progress...

After code modifications, please run all tests.

make test

Useful links