# ETL
[](https://travis-ci.org/jblondin/etl)
This package is general-purpose Extract-Transform-Load (ETL) library for Rust, built to load arbitrary plain text files into data frame objects.
Features:
* Delimiter speification (comma, tab, etc.)
* Data types:
* Signed / unsigned integers
* Floating point numbers
* Text fields
* Boolean values
* Transformations:
* Concatenation (of text fields)
* Mapping (from one text field to another)
* Conversion between types
* Scaling of values (for numeric values, e.g. between -1 and 1)
* Normalization of values
* Vectorization ([one-hot](https://en.wikipedia.org/wiki/One-hot) or [feature hashing](https://en.wikipedia.org/wiki/Feature_hashing))
* Filtering
Configuration is handled through a TOML file. For example:
```toml
## data_config.toml
[[source_files]]
name = "source1.csv"
delimiter = ","
fields = [ { source_name = "a_text_field", field_type = "Text", add_to_frame = false },
{ source_name = "another_text_field", field_type = "Text", add_to_frame = false } ]
[[source_files]]
name = "sourc2.tsv"
delimiter = "\t"
fields = [ { source_name = "an_integer", field_type = "Signed" },
{ source_name = "another_integer", field_type = "Signed" },
{ source_name = "a_category", field_type = "Text" },
{ source_name = "an_unused_float", field_type = "Float", add_to_frame = false } ]
[[transforms]]
method = { action = "Concatenate", separator = " & " }
source_fields = [ "a_text_field", "another_text_field" ]
target_name = "a_new_text_field"
[[transforms]]
source_fields = [ "a_category" ]
target_name = "category_mapped_to_integers"
[transforms.method]
action = "Map"
default_value = "-1"
map = { "first_category" = "0", "second_category" = "1" }
```
To load a configuration file:
```rust
let data_path = PathBuf::from(file!()).parent().unwrap().join("data_config.toml");
let (config, df) = DataFrame::load(data_path.as_path()).unwrap();
let mut fieldnames = df.fieldnames();
fieldnames.sort();
assert_eq!(fieldnames, ["a_category", "a_new_text_field", "an_integer", "another_integer"
"category_mapped_to_integers"]);
```
Once loaded, files can be transformed into a [matrix](https://github.com/jblondin/matrix) for further processing.
```rust
let (config, df) = DataFrame::load(data_path.as_path()).unwrap();
let (fieldnames, mat) = df.as_matrix().unwrap();
```