[][src]Crate csvsc

csvsc is a library for building csv file processors.

Imagine you have N csv files with the same structure and you want to use them to make other M csv files whose information depends in some way on the original files. This is what csvcv is for. With this tool you can build a processing chain that will modify each of the input files and generate new output files with the modifications.

Preparation Mode

Start a new binary project with cargo:

$ cargo new --bin miprocesadordecsv

Add csvsc and encoding as a dependency in Cargo.toml

[dependencies]
csvsc = { git = "https://github.com/categulario/csvsc-rs.git" }
encoding = "*"

Now build your processing chain. In this example, a processing chain is built with the following characteristics:

  1. It takes files 1.csv and 2.csv as input with UTF-8 encoding,
  2. adds virtual column _target which will define the output file and uses the a column of both input files in its definition,
  3. Eliminates column b.
use csvsc::ColSpec;
use csvsc::InputStream;
use csvsc::ReaderSource;
use csvsc::RowStream;
use csvsc::FlushTarget;

use encoding::all::UTF_8;

fn main() {
    let filenames = vec!["test/assets/1.csv", "test/assets/2.csv"];

    let mut chain = InputStream::from_readers(
            filenames
                .iter()
                .map(|f| ReaderSource::from_path(f).unwrap()),
            UTF_8,
        )
        .add(ColSpec::Mix {
            colname: "_target".to_string(),
            coldef: "output/{a}.csv".to_string(),
        }).unwrap()
        .del(vec!["b"])
        .flush(FlushTarget::Column("_target".to_string())).unwrap()
        .into_iter();

    while let Some(item) = chain.next() {
        if let Err(e) = item {
            eprintln!("failed {:?}", e);
        }
    }
}

Executing this project will lead to an output/ folder being created and inside there will be as many files as there are different values in column a.

Another example but for one file whose name is read from stdin:

use std::env;

use csvsc::ColSpec;
use csvsc::InputStream;
use csvsc::ReaderSource;
use csvsc::RowStream;
use csvsc::FlushTarget;

use encoding::all::UTF_8;

fn main() {
    let filename = env::args().next().unwrap();

    let reader_source = ReaderSource::from_path(filename).unwrap();

    let mut chain = InputStream::new(
            reader_source,
            UTF_8,
        )
        .add(ColSpec::Mix {
            colname: "_target".to_string(),
            coldef: "output/{a}.csv".to_string(),
        }).unwrap()
        .del(vec!["b"])
        .flush(FlushTarget::Column("_target".to_string())).unwrap()
        .into_iter();

    while let Some(item) = chain.next() {
        if let Err(e) = item {
            eprintln!("failed {:?}", e);
        }
    }
}

To know which methods are available in a processing chain, go to the RowStream documentation.

Columns with names that start with an underscore will not be written to the output files.

Modules

aggregate
col

Structs

Add

Adds a column to each register. It can be based on existing ones or the source filename.

AddWith

Adds a column to each register using a closure to generate its data.

AdjacentGroup

Groups data by a set of columns.

Del

Deletes the specified columns from each row.

Flush

Flushes the rows to the destination specified by a column.

Group

Groups data by a set of columns.

Headers

A structure for keeping relationship between the headers and their positions

InputStream

A structure for creating a transformation chain from input files.

Inspect

Allows calling a closure on each row, just like in rust's Iterator trait.

MapCol
MapRow
MockStream

A simple struct that helps create RowStreams from vectors.

ReaderSource

Represents a file as source of CSV data.

Reduce

Used to aggregate the given rows, yielding the results as a new stream of rows with potentially new columns.

Rename

Changes a column's name

Enums

ColSpec

Types of specifications available to create a new column.

Error

An error found somewhere in the transformation chain.

FlushTarget
GroupBuildError

Things that could go wrong while building a group or adjacent group

Constants

SOURCE_FIELD

A column with this name will be added to each record. The column will have as a value the absolute path to the input file and serves to extract information that may be contained, for example, in the file name. It is useful in combination with the processor Add.

Traits

RowStream

This trait describes de behaviour of every component in the CSV transformation chain. Functions provided by this trait help construct the chain and can be chained.

Type Definitions

Row

Type alias of csv::StringRecord. Represents a row of data.

RowResult

The type that actually flows the transformation chain. Either a row or an error.