etl 0.1.0

A general-purpose extract-transform-load (ETL) tool
# ETL

[![Build Status](https://travis-ci.org/jblondin/etl.svg?branch=master)](https://travis-ci.org/jblondin/etl)

This package is general-purpose Extract-Transform-Load (ETL) library for Rust, built to load arbitrary plain text files into data frame objects.

Features:
* Delimiter speification (comma, tab, etc.)
* Data types:
  * Signed / unsigned integers
  * Floating point numbers
  * Text fields
  * Boolean values
* Transformations:
  * Concatenation (of text fields)
  * Mapping (from one text field to another)
  * Conversion between types
  * Scaling of values (for numeric values, e.g. between -1 and 1)
  * Normalization of values
  * Vectorization ([one-hot]https://en.wikipedia.org/wiki/One-hot or [feature hashing]https://en.wikipedia.org/wiki/Feature_hashing)
* Filtering

Configuration is handled through a TOML file. For example:
```toml
## data_config.toml

[[source_files]]
name = "source1.csv"
delimiter = ","
fields = [ { source_name = "a_text_field", field_type = "Text", add_to_frame = false },
           { source_name = "another_text_field", field_type = "Text", add_to_frame = false } ]

[[source_files]]
name = "sourc2.tsv"
delimiter = "\t"
fields = [ { source_name = "an_integer", field_type = "Signed" },
           { source_name = "another_integer", field_type = "Signed" },
           { source_name = "a_category", field_type = "Text" },
           { source_name = "an_unused_float", field_type = "Float", add_to_frame = false } ]

[[transforms]]
method = { action = "Concatenate",  separator = " & " }
source_fields = [ "a_text_field", "another_text_field" ]
target_name = "a_new_text_field"

[[transforms]]
source_fields = [ "a_category" ]
target_name = "category_mapped_to_integers"

[transforms.method]
action = "Map"
default_value = "-1"
map = { "first_category" = "0", "second_category" = "1" }
```
To load a configuration file:
```rust
let data_path = PathBuf::from(file!()).parent().unwrap().join("data_config.toml");

let (config, df) = DataFrame::load(data_path.as_path()).unwrap();

let mut fieldnames = df.fieldnames();
fieldnames.sort();
assert_eq!(fieldnames, ["a_category", "a_new_text_field", "an_integer", "another_integer"
    "category_mapped_to_integers"]);
```

Once loaded, files can be transformed into a [matrix](https://github.com/jblondin/matrix) for further processing.
```rust
let (config, df) = DataFrame::load(data_path.as_path()).unwrap();
let (fieldnames, mat) = df.as_matrix().unwrap();
```