Crate csvsc[][src]

csvsc is a framework for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.

Quickstart

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml.

[dependencies]
csvsc = "1"

Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.

use csvsc::prelude::*;

fn main() {
    let mut chain = InputStreamBuilder::from_paths(&[
            // Put here the path to your source files, from 1 to a million
            "test/assets/chicken_north.csv",
            "test/assets/chicken_south.csv",
        ]).unwrap().build().unwrap()

        // Here is where you do the magic: add columns, remove ones, filter
        // the rows, group and aggregate, even probably transpose the data
        // to fit your needs.

        // Specify some (zero, one or many) output targets so that results of
        // your computations get stored somewhere.
        .flush(Target::path("data/output.csv")).unwrap()

        .into_iter();

    // And finally consume the stream, reporting any errors to stderr.
    while let Some(item) = chain.next() {
        if let Err(e) = item {
            eprintln!("{}", e);
        }
    }
}

Example

Grab your input files, in this case I’ll use this two:

chicken_north.csv

month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8

chicken_south.csv

month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12

Now build your processing chain.

// main.rs
use csvsc::prelude::*;

use encoding::all::UTF_8;

fn main() {
    let mut chain = InputStreamBuilder::from_paths(vec![
            "test/assets/chicken_north.csv",
            "test/assets/chicken_south.csv",
        ]).unwrap()

        // optionally specify the encoding
        .with_encoding(UTF_8)

        // optionally add a column with the path of the source file as specified
        // in the builder
        .with_source_col("_source")

        // build the row stream
        .build().unwrap()

        // Filter some columns with invalid values
        .filter_col("eggs per week", |value| {
            value.len() > 0 && value != "NaN"
        }).unwrap()

        // add a column with a value obtained from the filename ¡wow!
        .add(
            Column::with_name("region")
                .from_column("_source")
                .with_regex("_([a-z]+).csv").unwrap()
                .definition("$1")
        ).unwrap()

        // group by two columns, compute some aggregates
        .group(["region", "month"], |row_stream| {
            row_stream.reduce(vec![
                Reducer::with_name("region").of_column("region").last("").unwrap(),
                Reducer::with_name("month").of_column("month").last("").unwrap(),
                Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
                Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
            ]).unwrap()
        })

        // Write a report to a single file that will contain all the data
        .flush(
            Target::path("data/report.csv")
        ).unwrap()

        // This column will allow us to output to multiple files, in this case
        // a report by month
        .add(
            Column::with_name("monthly report")
                .from_all_previous()
                .definition("data/monthly/{month}.csv")
        ).unwrap()

        .del(vec!["month"])

        // Write every row to a file specified by its `monthly report` column added
        // previously
        .flush(
            Target::from_column("monthly report")
        ).unwrap()

        // Pack the processing chain into an interator that can be consumed.
        .into_iter();

    // Consuming the iterator actually triggers all the transformations.
    while let Some(item) = chain.next() {
        item.unwrap();
    }
}

This is what comes as output:

data/monthly/1.csv

region,avg,sum
south,2,2
north,4.5,9

data/monthly/2.csv

region,avg,sum
north,6,12
south,12.666666666666666,38

data/monthly/3.csv

region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17

data/report.csv

region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9

Dig deeper

Check InputStreamBuilder to see more options for starting a processing chain and reading your input.

Go to the RowStream documentation to see all the transformations available.

Re-exports

pub use crate::input::InputStream;
pub use crate::error::Error;
pub use crate::error::RowResult;

Modules

add

Machinery for adding columns.

error

All the things that can go wrong.

input

Machinery for starting processing chains.

prelude

Everything you’ll ever need to build your processing chains.

Structs

Column

A simple interface for building and adding new columns.

GroupBy

Group rows by the output of a closure.

Headers

A structure for keeping relationship between the headers and their positions

MockStream

A simple struct that helps create RowStreams from vectors.

Reducer

An uncomplicated builder of arguments for InputStream’s reduce method.

Target

Helper for building a target for flushing data into.

Traits

Aggregate

Aggregates used while reducing must implement this trait.

GroupCriteria

Types implementing this trait can be used to group rows in a row stream, both for .group() and .adjacent_group()

RowStream

This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained.

Type Definitions

Row

Type alias of csv::StringRecord. Represents a row of data.