Crate csvsc

source · []
Expand description

csvsc is a framework for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.

Quickstart

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml.

[dependencies]
csvsc = "2.2"

Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.

use csvsc::prelude::*;

let mut chain = InputStreamBuilder::from_paths(&[
        // Put here the path to your source files, from 1 to a million
        "test/assets/chicken_north.csv",
        "test/assets/chicken_south.csv",
    ]).unwrap().build().unwrap()

    // Here is where you do the magic: add columns, remove ones, filter
    // the rows, group and aggregate, even probably transpose the data
    // to fit your needs.

    // Specify some (zero, one or many) output targets so that results of
    // your computations get stored somewhere.
    .flush(Target::path("data/output.csv")).unwrap()

    .into_iter();

// And finally consume the stream, reporting any errors to stderr.
while let Some(item) = chain.next() {
    if let Err(e) = item {
        eprintln!("{}", e);
    }
}

Example

Grab your input files, in this case I’ll use this two:

chicken_north.csv

month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8

chicken_south.csv

month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12

Now build your processing chain.

// main.rs
use csvsc::prelude::*;

use encoding::all::UTF_8;

let mut chain = InputStreamBuilder::from_paths(vec![
        "test/assets/chicken_north.csv",
        "test/assets/chicken_south.csv",
    ]).unwrap()

    // optionally specify the encoding
    .with_encoding(UTF_8)

    // optionally add a column with the path of the source file as specified
    // in the builder
    .with_source_col("_source")

    // build the row stream
    .build().unwrap()

    // Filter some columns with invalid values
    .filter_col("eggs per week", |value| {
        value.len() > 0 && value != "NaN"
    }).unwrap()

    // add a column with a value obtained from the filename ¡wow!
    .add(
        Column::with_name("region")
            .from_column("_source")
            .with_regex("_([a-z]+).csv").unwrap()
            .definition("$1")
    ).unwrap()

    // group by two columns, compute some aggregates
    .group(["region", "month"], |row_stream| {
        row_stream.reduce(vec![
            Reducer::with_name("region").of_column("region").last("").unwrap(),
            Reducer::with_name("month").of_column("month").last("").unwrap(),
            Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
            Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
        ]).unwrap()
    })

    // Write a report to a single file that will contain all the data
    .flush(
        Target::path("data/report.csv")
    ).unwrap()

    // This column will allow us to output to multiple files, in this case
    // a report by month
    .add(
        Column::with_name("monthly report")
            .from_all_previous()
            .definition("data/monthly/{month}.csv")
    ).unwrap()

    .del(vec!["month"])

    // Write every row to a file specified by its `monthly report` column added
    // previously
    .flush(
        Target::from_column("monthly report")
    ).unwrap()

    // Pack the processing chain into an interator that can be consumed.
    .into_iter();

// Consuming the iterator actually triggers all the transformations.
while let Some(item) = chain.next() {
    item.unwrap();
}

This is what comes as output:

data/monthly/1.csv

region,avg,sum
south,2,2
north,4.5,9

data/monthly/2.csv

region,avg,sum
north,6,12
south,12.666666666666666,38

data/monthly/3.csv

region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17

data/report.csv

region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9

Dig deeper

Check InputStreamBuilder to see more options for starting a processing chain and reading your input.

Go to the RowStream documentation to see all the transformations available as well as options to flush the data to files or standard I/O.

Re-exports

pub use crate::input::InputStream;
pub use crate::error::Error;
pub use crate::error::RowResult;

Modules

Machinery for adding columns.

All the things that can go wrong.

Machinery for starting processing chains.

Everything you’ll ever need to build your processing chains.

Structs

A simple interface for building and adding new columns.

A structure for keeping relationship between the headers and their positions

A simple struct that helps create RowStreams from vectors.

An uncomplicated builder of arguments for InputStream’s reduce method.

Helper for building a target for flushing data into.

Traits

Aggregates used while reducing must implement this trait.

Types implementing this trait can be used to group rows in a row stream, both for group() and adjacent_group()

This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained.

Type Definitions

Type alias of csv::StringRecord. Represents a row of data.