Crate csvsc

Expand description

csvsc is a framework for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.

§Quickstart

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml.

[dependencies]
csvsc = "2.2"

Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.

use csvsc::prelude::*;

let mut chain = InputStreamBuilder::from_paths(&[
        // Put here the path to your source files, from 1 to a million
        "test/assets/chicken_north.csv",
        "test/assets/chicken_south.csv",
    ]).unwrap().build().unwrap()

    // Here is where you do the magic: add columns, remove ones, filter
    // the rows, group and aggregate, even probably transpose the data
    // to fit your needs.

    // Specify some (zero, one or many) output targets so that results of
    // your computations get stored somewhere.
    .flush(Target::path("data/output.csv")).unwrap()

    .into_iter();

// And finally consume the stream, reporting any errors to stderr.
while let Some(item) = chain.next() {
    if let Err(e) = item {
        eprintln!("{}", e);
    }
}

§Example

Grab your input files, in this case I’ll use this two:

chicken_north.csv

month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8

chicken_south.csv

month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12

Now build your processing chain.

// main.rs
use csvsc::prelude::*;

use encoding::all::UTF_8;

let mut chain = InputStreamBuilder::from_paths(vec![
        "test/assets/chicken_north.csv",
        "test/assets/chicken_south.csv",
    ]).unwrap()

    // optionally specify the encoding
    .with_encoding(UTF_8)

    // optionally add a column with the path of the source file as specified
    // in the builder
    .with_source_col("_source")

    // build the row stream
    .build().unwrap()

    // Filter some columns with invalid values
    .filter_col("eggs per week", |value| {
        value.len() > 0 && value != "NaN"
    }).unwrap()

    // add a column with a value obtained from the filename ¡wow!
    .add(
        Column::with_name("region")
            .from_column("_source")
            .with_regex("_([a-z]+).csv").unwrap()
            .definition("$1")
    ).unwrap()

    // group by two columns, compute some aggregates
    .group(["region", "month"], |row_stream| {
        row_stream.reduce(vec![
            Reducer::with_name("region").of_column("region").last("").unwrap(),
            Reducer::with_name("month").of_column("month").last("").unwrap(),
            Reducer::with_name("avg").of_column("eggs per week").average().unwrap(),
            Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(),
        ]).unwrap()
    })

    // Write a report to a single file that will contain all the data
    .flush(
        Target::path("data/report.csv")
    ).unwrap()

    // This column will allow us to output to multiple files, in this case
    // a report by month
    .add(
        Column::with_name("monthly report")
            .from_all_previous()
            .definition("data/monthly/{month}.csv")
    ).unwrap()

    .del(vec!["month"])

    // Write every row to a file specified by its `monthly report` column added
    // previously
    .flush(
        Target::from_column("monthly report")
    ).unwrap()

    // Pack the processing chain into an interator that can be consumed.
    .into_iter();

// Consuming the iterator actually triggers all the transformations.
while let Some(item) = chain.next() {
    item.unwrap();
}

This is what comes as output:

data/monthly/1.csv

region,avg,sum
south,2,2
north,4.5,9

data/monthly/2.csv

region,avg,sum
north,6,12
south,12.666666666666666,38

data/monthly/3.csv

region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17

data/report.csv

region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9

§Dig deeper

Check InputStreamBuilder to see more options for starting a processing chain and reading your input.

Go to the RowStream documentation to see all the transformations available as well as options to flush the data to files or standard I/O.

Re-exports§

pub use crate::input::InputStream;
pub use crate::add::Column;
pub use crate::error::Error;
pub use crate::error::RowResult;

Modules§

add: Machinery for adding columns.
error: All the things that can go wrong.
input: Machinery for starting processing chains.
prelude: Everything you’ll ever need to build your processing chains.

Structs§

Headers: A structure for keeping relationship between the headers and their positions
MockStream: A simple struct that helps create RowStreams from vectors.
Reducer: An uncomplicated builder of arguments for InputStream’s reduce method.
Target: Helper for building a target for flushing data into.

Traits§

Aggregate: Aggregates used while reducing must implement this trait.
GroupCriteria: Types implementing this trait can be used to group rows in a row stream, both for group() and adjacent_group()
RowStream: This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained.

Type Aliases§

Row: Type alias of csv::StringRecord. Represents a row of data.

Crate csvscCopy item path