etl-0.1.0 doesn't have any documentation.
ETL
This package is general-purpose Extract-Transform-Load (ETL) library for Rust, built to load arbitrary plain text files into data frame objects.
Features:
- Delimiter speification (comma, tab, etc.)
- Data types:
- Signed / unsigned integers
- Floating point numbers
- Text fields
- Boolean values
- Transformations:
- Concatenation (of text fields)
- Mapping (from one text field to another)
- Conversion between types
- Scaling of values (for numeric values, e.g. between -1 and 1)
- Normalization of values
- Vectorization (one-hot or feature hashing)
- Filtering
Configuration is handled through a TOML file. For example:
## data_config.toml
[[]]
= "source1.csv"
= ","
= [ { = "a_text_field", = "Text", = false },
{ = "another_text_field", = "Text", = false } ]
[[]]
= "sourc2.tsv"
= "\t"
= [ { = "an_integer", = "Signed" },
{ = "another_integer", = "Signed" },
{ = "a_category", = "Text" },
{ = "an_unused_float", = "Float", = false } ]
[[]]
= { = "Concatenate", = " & " }
= [ "a_text_field", "another_text_field" ]
= "a_new_text_field"
[[]]
= [ "a_category" ]
= "category_mapped_to_integers"
[]
= "Map"
= "-1"
= { = "0", = "1" }
To load a configuration file:
let data_path = from.parent.unwrap.join;
let = load.unwrap;
let mut fieldnames = df.fieldnames;
fieldnames.sort;
assert_eq!;
Once loaded, files can be transformed into a matrix for further processing.
let = load.unwrap;
let = df.as_matrix.unwrap;