[−][src]Crate csvsc
csvsc
is a library for building csv file processors.
Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv is for. With this tool you can build a processing chain that will modify each of the input files and generate new output files with the modifications.
Preparation Mode
Start a new binary project with cargo:
$ cargo new --bin miprocesadordecsv
Add csvsc
and encoding
as a dependency in Cargo.toml
[dependencies]
csvsc = { git = "https://github.com/categulario/csvsc-rs.git" }
encoding = "*"
Now build your processing chain. In this example, a processing chain is built with the following characteristics:
- It takes files
1.csv
and2.csv
as input withUTF-8
encoding, - adds virtual column
_target
which will define the output file and uses thea
column of both input files in its definition, - Eliminates column
b
.
use csvsc::ColSpec; use csvsc::InputStream; use csvsc::ReaderSource; use csvsc::RowStream; use csvsc::FlushTarget; use encoding::all::UTF_8; fn main() { let filenames = vec!["test/assets/1.csv", "test/assets/2.csv"]; let mut chain = InputStream::from_readers( filenames .iter() .map(|f| ReaderSource::from_path(f).unwrap()), UTF_8, ) .add(ColSpec::Mix { colname: "_target".to_string(), coldef: "output/{a}.csv".to_string(), }).unwrap() .del(vec!["b"]) .flush(FlushTarget::Column("_target".to_string())).unwrap() .into_iter(); while let Some(item) = chain.next() { if let Err(e) = item { eprintln!("failed {:?}", e); } } }
Executing this project will lead to an output/
folder being created and
inside there will be as many files as there are different values in column a
.
Another example but for one file whose name is read from stdin:
use std::env; use csvsc::ColSpec; use csvsc::InputStream; use csvsc::ReaderSource; use csvsc::RowStream; use csvsc::FlushTarget; use encoding::all::UTF_8; fn main() { let filename = env::args().next().unwrap(); let reader_source = ReaderSource::from_path(filename).unwrap(); let mut chain = InputStream::new( reader_source, UTF_8, ) .add(ColSpec::Mix { colname: "_target".to_string(), coldef: "output/{a}.csv".to_string(), }).unwrap() .del(vec!["b"]) .flush(FlushTarget::Column("_target".to_string())).unwrap() .into_iter(); while let Some(item) = chain.next() { if let Err(e) = item { eprintln!("failed {:?}", e); } } }
To know which methods are available in a processing chain, go to the RowStream documentation.
Columns with names that start with an underscore will not be written to the output files.
Modules
aggregate | |
col |
Structs
Add | Adds a column to each register. It can be based on existing ones or the source filename. |
AddWith | Adds a column to each register using a closure to generate its data. |
AdjacentGroup | Groups data by a set of columns. |
Del | Deletes the specified columns from each row. |
Flush | Flushes the rows to the destination specified by a column. |
Group | Groups data by a set of columns. |
Headers | A structure for keeping relationship between the headers and their positions |
InputStream | A structure for creating a transformation chain from input files. |
Inspect | Allows calling a closure on each row, just like in rust's Iterator trait. |
MapCol | |
MapRow | |
MockStream | A simple struct that helps create RowStreams from vectors. |
ReaderSource | Represents a file as source of CSV data. |
Reduce | Used to aggregate the given rows, yielding the results as a new stream of rows with potentially new columns. |
Rename | Changes a column's name |
Enums
ColSpec | Types of specifications available to create a new column. |
Error | An error found somewhere in the transformation chain. |
FlushTarget | |
GroupBuildError | Things that could go wrong while building a group or adjacent group |
Constants
SOURCE_FIELD | A column with this name will be added to each record. The column will have as a value the absolute path to the input file and serves to extract information that may be contained, for example, in the file name. It is useful in combination with the processor Add. |
Traits
RowStream | This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained. |
Type Definitions
Row | Type alias of csv::StringRecord. Represents a row of data. |
RowResult | The type that actually flows the transformation chain. Either a row or an error. |