Struct ShardedWriter

Source

pub struct ShardedWriter<FKey, FNameFile>where
    FNameFile: Fn(&str, usize) -> String,
{ /* private fields */ }

Implementations§

Source §

impl<FKey, FNameFile> ShardedWriter<FKey, FNameFile>
where FKey: Fn(&StringRecord) -> String, FNameFile: Fn(&str, usize) -> String,

Source

pub fn with_output_splitting(self, output_splitting: FileSplitting) -> Self

Creates a new writer.

You must specify the directory into which the output will be written, a function that extracts the shard key from a csv StringRecord, and how output files will be named. The file naming function accepts the shard key and a zero-based number indicating how many files have been created for this shard.

This function can return an error if the output directory can’t be created.

let writer = ShardedWriter::new(
    "./foo-sharded/",
    |record| record.get(7).unwrap_or("_unknown").to_string(),
    |shard, seq| format!("{}-file{}.csv", shard, seq)
)?;

Specifies when sharded output files should be split.

Source

pub fn with_delimiter(self, delimiter: u8) -> Self

Sets the field delimiter to be used for output files. Default is ‘,’.

Source

pub fn on_file_completion(self, f: fn(&Path, &str)) -> Self

Sets an optional function that will be called when individual files are completed, either because they have been split by the number of rows or bytes or because processing is complete and the values are being dropped.

Source

pub fn on_create_file(self, f: fn(&Path) -> Result<Box<dyn Write>>) -> Self

Takes a closure that specifies how to create output files.

The closure provides the Path of the output file to be created. If you don’t provide your own way to create output files, the default implementation will simply create a new BufWriter for the output file, which is the same as:

my_sharded_writer.on_create_file(|path| Ok(BufWriter::new(File::create(path)?)));

This function may be useful if, for example, you want to inject gzip compression into the output writer.

Source

pub fn process_file(&mut self, filename: &str) -> Result<usize, Error>

Processes the input filename, creating output files according to the specified key selector.

This function will fail if the output directory or an output file can’t be created or if a row can’t be written. It can also fail if it is called multiple times with files that have different column counts.

On success, the number of records written is returned.

Source

pub fn process_csv<T: Read>( &mut self, csv_reader: &mut Reader<T>, ) -> Result<usize, Error>

Processes the input reader, creating output files as appropriate.

This function will fail if the output directory or an output file can’t be created or if a row can’t be written. It can also fail if it is called multiple times with files that have different column counts.

On success, the number of records written is returned.

Source