[][src]Crate csvsc

csvsc is a framework for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.

Quickstart

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml

[dependencies]
csvsc = "1.0"
encoding = "*"

Grab your input files, in this case I'll use this two:

chicken_north.csv

month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8

chicken_south.csv

month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12

Now build your processing chain.

// main.rs
use csvsc::prelude::*;

use encoding::all::UTF_8;

fn main() {
    let mut chain = InputStream::from_paths(
            vec![
                "test/assets/chicken_north.csv",
                "test/assets/chicken_south.csv",
            ],
            UTF_8,
        ).unwrap()

        // Filter some columns with invalid values
        .filter_col("eggs per week", |value| {
            value.len() > 0 && value != "NaN"
        }).unwrap()

        // add a column with a value obtained from the filename ¡wow!
        .add(
            Column::with_name("region")
                .from_column("_source")
                .with_regex("_([a-z]+).csv").unwrap()
                .definition("$1")
        ).unwrap()

        // group by two columns, compute some aggregates
        .group(["region", "month"], |row_stream| {
            row_stream.reduce(vec![
                Reducer::with_name("region").of_column("region").last("").unwrap(),
                Reducer::with_name("month").of_column("month").last("").unwrap(),
                Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
                Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
            ]).unwrap()
        })

        // Write a report to a single file that will contain all the data
        .flush(
            Target::path("data/report.csv")
        ).unwrap()

        // This column will allow us to output to multiple files, in this case
        // a report by month
        .add(
            Column::with_name("monthly report")
                .from_all_previous()
                .definition("data/monthly/{month}.csv")
        ).unwrap()

        .del(vec!["month"])

        // Write every row to a file specified by its `monthly report` column added
        // previously
        .flush(
            Target::from_column("monthly report")
        ).unwrap()

        // Pack the processing chain into an interator that can be consumed.
        .into_iter();

    // Consuming the iterator actually triggers all the transformations.
    while let Some(item) = chain.next() {
        item.unwrap();
    }
}

This is what comes as output:

data/monthly/1.csv

region,avg,sum
south,2,2
north,4.5,9

data/monthly/2.csv

region,avg,sum
north,6,12
south,12.666666666666666,38

data/monthly/3.csv

region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17

data/report.csv

region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9

Do it yourself!

All transformation chains start with an InputStream, you can use its methods to easily get started.

use csvsc::prelude::*;

use encoding::all::UTF_8;

fn main() {
    let mut chain = InputStream::from_paths(
            vec![
                "test/assets/chicken_north.csv",
                "test/assets/chicken_south.csv",
            ],
            UTF_8,
        ).unwrap();
}

Dig deeper

Go to the RowStream documentation that contains all methods available.

Re-exports

pub use crate::input::InputStream;
pub use crate::error::Error;
pub use crate::error::RowResult;

Modules

add
error
input
prelude

Structs

Column

A simple interface for building and adding new columns.

GroupBy

Group rows by the output of a closure.

Headers

A structure for keeping relationship between the headers and their positions

MockStream

A simple struct that helps create RowStreams from vectors.

Reducer

An uncomplicated builder of arguments for InputStream's reduce method.

Target

Helper for building a target for flushing data into.

Constants

SOURCE_FIELD

A column with this name will be added to each record. The column will have as a value the absolute path to the input file and serves to extract information that may be contained, for example, in the file name. It is useful in combination with the processor Add.

Traits

Aggregate

Aggregates used while reducing must implement this trait.

GroupCriteria

Types implementing this trait can be used to group rows in a row stream, both for .group() and .adjacent_group()

RowStream

This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained.

Type Definitions

Row

Type alias of csv::StringRecord. Represents a row of data.