Crate csvsc[−][src]
csvsc
is a framework for building csv file processors.
Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv was built for. With this tool you can build a processing chain (row stream) that will take each of the input files and generate new output files with the modifications.
Quickstart
Start a new binary project with cargo:
$ cargo new --bin miprocesadordecsv
Add csvsc
and encoding
as a dependency in Cargo.toml
.
[dependencies]
csvsc = "1"
Now start building your processing chain. Specify the inputs (one or more csv files), the transformations, and the output.
use csvsc::prelude::*; fn main() { let mut chain = InputStreamBuilder::from_paths(&[ // Put here the path to your source files, from 1 to a million "test/assets/chicken_north.csv", "test/assets/chicken_south.csv", ]).unwrap().build().unwrap() // Here is where you do the magic: add columns, remove ones, filter // the rows, group and aggregate, even probably transpose the data // to fit your needs. // Specify some (zero, one or many) output targets so that results of // your computations get stored somewhere. .flush(Target::path("data/output.csv")).unwrap() .into_iter(); // And finally consume the stream, reporting any errors to stderr. while let Some(item) = chain.next() { if let Err(e) = item { eprintln!("{}", e); } } }
Example
Grab your input files, in this case I’ll use this two:
chicken_north.csv
month,eggs per week
1,3
1,NaN
1,6
2,
2,4
2,8
3,5
3,1
3,8
chicken_south.csv
month,eggs per week
1,2
1,NaN
1,
2,7
2,8
2,23
3,3
3,2
3,12
Now build your processing chain.
// main.rs use csvsc::prelude::*; use encoding::all::UTF_8; fn main() { let mut chain = InputStreamBuilder::from_paths(vec![ "test/assets/chicken_north.csv", "test/assets/chicken_south.csv", ]).unwrap() // optionally specify the encoding .with_encoding(UTF_8) // optionally add a column with the path of the source file as specified // in the builder .with_source_col("_source") // build the row stream .build().unwrap() // Filter some columns with invalid values .filter_col("eggs per week", |value| { value.len() > 0 && value != "NaN" }).unwrap() // add a column with a value obtained from the filename ¡wow! .add( Column::with_name("region") .from_column("_source") .with_regex("_([a-z]+).csv").unwrap() .definition("$1") ).unwrap() // group by two columns, compute some aggregates .group(["region", "month"], |row_stream| { row_stream.reduce(vec![ Reducer::with_name("region").of_column("region").last("").unwrap(), Reducer::with_name("month").of_column("month").last("").unwrap(), Reducer::with_name("avg").of_column("eggs per week").average().unwrap(), Reducer::with_name("sum").of_column("eggs per week").sum(0.0).unwrap(), ]).unwrap() }) // Write a report to a single file that will contain all the data .flush( Target::path("data/report.csv") ).unwrap() // This column will allow us to output to multiple files, in this case // a report by month .add( Column::with_name("monthly report") .from_all_previous() .definition("data/monthly/{month}.csv") ).unwrap() .del(vec!["month"]) // Write every row to a file specified by its `monthly report` column added // previously .flush( Target::from_column("monthly report") ).unwrap() // Pack the processing chain into an interator that can be consumed. .into_iter(); // Consuming the iterator actually triggers all the transformations. while let Some(item) = chain.next() { item.unwrap(); } }
This is what comes as output:
data/monthly/1.csv
region,avg,sum
south,2,2
north,4.5,9
data/monthly/2.csv
region,avg,sum
north,6,12
south,12.666666666666666,38
data/monthly/3.csv
region,avg,sum
north,4.666666666666667,14
south,5.666666666666667,17
data/report.csv
region,month,avg,sum
north,2,6,12
south,1,2,2
south,2,12.666666666666666,38
north,3,4.666666666666667,14
south,3,5.666666666666667,17
north,1,4.5,9
Dig deeper
Check InputStreamBuilder
to see more options for starting
a processing chain and reading your input.
Go to the RowStream
documentation to see all the
transformations available.
Re-exports
pub use crate::input::InputStream; |
pub use crate::error::Error; |
pub use crate::error::RowResult; |
Modules
add | Machinery for adding columns. |
error | All the things that can go wrong. |
input | Machinery for starting processing chains. |
prelude | Everything you’ll ever need to build your processing chains. |
Structs
Column | A simple interface for building and adding new columns. |
GroupBy | Group rows by the output of a closure. |
Headers | A structure for keeping relationship between the headers and their positions |
MockStream | A simple struct that helps create RowStreams from vectors. |
Reducer | An uncomplicated builder of arguments for InputStream’s reduce method. |
Target | Helper for building a target for flushing data into. |
Traits
Aggregate | Aggregates used while reducing must implement this trait. |
GroupCriteria | Types implementing this trait can be used to group rows in a row stream, both for .group() and .adjacent_group() |
RowStream | This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained. |
Type Definitions
Row | Type alias of csv::StringRecord. Represents a row of data. |