rettle
This library is a multithreaded ETL (Extract, Transfrom, Load), with inspiration drawn from Keras, to allow a "Brew Master" to define any order of operations for data transformations and outputs.
Types
rettle has the following Types to be used in any project to "Brew" data:
- Pot: container that holds the set of instructions for data sources, sinks, and transforms (See Ingredient Types below)
- Brewery: manager that holds the brewers and sends them jobs and the initial state of tea to be processed
- Brewer: worker that brews the Tea
Traits
- Tea: inherited by custom data struct defined that will be transformed in the ETL pipeline
- Ingredient: defines the steps that can be included in the ETL recipe
- Argument: defines additional params that an Ingredient operation can use (Optional)
Ingredient Types
- Fill: data input source
- Transfuse: combine data from multiple sources defined before this step Not Implemented Yet
- Steep: data transformation step
- Skim: remove a field (or Tea object) Not Implemented Yet
- Pour: data output destination
Using rettle
In your custom project you first need to define the custom "Tea" struct that will be created by the Fill
Ingredient.
Example:
Plus implement the Tea
trait methods.
Example:
Next you can create a new Pot
struct and supply it with sources and ingredients before calling it's brew()
method to kick off the brewing process. Ingredients can be supplied with Optional Argument
trait structs to pass additional runtime parameters used by your custom filters.
Optional Steep Argument Example:
Finally a Brewery
struct must be created to specify the number of Brewers
(threads) to run the code, and a start_time
value to provide elapsed run time metrics.
Fill
operations collect and pass the Tea
objects to be worked on to the Brewery
for it to be processed by the Brewers
.
Example Project Code
Ingredient Crates
The community can add Ingredient crates that can be used along with this crate to simplify adding ingredients for common integrations or transformations. Some sample crates include:
- cstea: Fill/Pour integrations for csv files
- elastictea: Fill/Pour integrations for Elasticsearch
- logtea: Fill integration for log files