shard-csv 0.1.0

A library to aid in splitting CSV/TSV files into multiple disjoint files.
Documentation
  • Coverage
  • 45.45%
    5 out of 11 items documented1 out of 3 items with examples
  • Size
  • Source code size: 39.83 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 2.86 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 19s Average build duration of successful builds.
  • all releases: 19s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • aeshirey/shard-csv
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • aeshirey

shard-csv

shard-csv is a crate to split input CSV files into output shards according to some key selector. Use it when you have some large dataset that you want to split out with more control than, say, GNU split.

Usage

Include it in your Cargo.toml with: shard-csv = "0.1.0".

Sample usage first entails creating a CSV reader. Note that shard-csv depends heavily upon the csv crate, which it, in fact, uses and re-exports:

let mut reader = shard_csv::csv::ReaderBuilder::new()
    .from_path("input_data.csv")
    .expect("Failed to create reader from file");

Then you can create a sharded CSV writer that:

  • Knows how to identify which shard each row belongs to,
  • Can write each shard into a single file or multiple files, split on number of rows or size in bytes,
  • Can create output streams arbitrarily (eg, gzipped),
  • Notifies you when a stream is complete
let mut writer = ShardedWriterBuilder::new_from_csv_reader(&mut reader)
    .expect("Failed to create writer")
    .with_key_selector(|row| row.get(2).unwrap_or("unknown").to_string())
    .with_output_shard_naming(|key, seq| format!("data.part{}.csv", key, seq))
    .with_output_splitting(FileSplitting::SplitAfterBytes(1024 * 1024))
    .on_file_completion(|path, key| {
        println!("The file {} is now ready for shard {}", path.display(), key);
        // Do something more with the completed file if you want.
    });

writer.process_csv(&mut reader).ok();