pub struct ShardedWriter<FKey, FNameFile>{ /* private fields */ }
Implementations§
Source§impl<FKey, FNameFile> ShardedWriter<FKey, FNameFile>
impl<FKey, FNameFile> ShardedWriter<FKey, FNameFile>
Sourcepub fn with_output_splitting(self, output_splitting: FileSplitting) -> Self
pub fn with_output_splitting(self, output_splitting: FileSplitting) -> Self
Creates a new writer.
You must specify the directory into which the output will be written, a function that extracts the shard key from a csv StringRecord, and how output files will be named. The file naming function accepts the shard key and a zero-based number indicating how many files have been created for this shard.
This function can return an error if the output directory can’t be created.
let writer = ShardedWriter::new(
"./foo-sharded/",
|record| record.get(7).unwrap_or("_unknown").to_string(),
|shard, seq| format!("{}-file{}.csv", shard, seq)
)?;
Specifies when sharded output files should be split.
Sourcepub fn with_delimiter(self, delimiter: u8) -> Self
pub fn with_delimiter(self, delimiter: u8) -> Self
Sets the field delimiter to be used for output files. Default is ‘,’.
Sourcepub fn on_file_completion(self, f: fn(&Path, &str)) -> Self
pub fn on_file_completion(self, f: fn(&Path, &str)) -> Self
Sets an optional function that will be called when individual files are completed, either because they have been split by the number of rows or bytes or because processing is complete and the values are being dropped.
Sourcepub fn on_create_file(self, f: fn(&Path) -> Result<Box<dyn Write>>) -> Self
pub fn on_create_file(self, f: fn(&Path) -> Result<Box<dyn Write>>) -> Self
Takes a closure that specifies how to create output files.
The closure provides the Path of the output file to be created. If you don’t provide your own way to create output files, the default implementation will simply create a new BufWriter for the output file, which is the same as:
my_sharded_writer.on_create_file(|path| Ok(BufWriter::new(File::create(path)?)));
This function may be useful if, for example, you want to inject gzip compression into the output writer.
Sourcepub fn process_file(&mut self, filename: &str) -> Result<usize, Error>
pub fn process_file(&mut self, filename: &str) -> Result<usize, Error>
Processes the input filename
, creating output files according to the specified key
selector.
This function will fail if the output directory or an output file can’t be created or if a row can’t be written. It can also fail if it is called multiple times with files that have different column counts.
On success, the number of records written is returned.
Sourcepub fn process_csv<T: Read>(
&mut self,
csv_reader: &mut Reader<T>,
) -> Result<usize, Error>
pub fn process_csv<T: Read>( &mut self, csv_reader: &mut Reader<T>, ) -> Result<usize, Error>
Processes the input reader, creating output files as appropriate.
This function will fail if the output directory or an output file can’t be created or if a row can’t be written. It can also fail if it is called multiple times with files that have different column counts.
On success, the number of records written is returned.
Sourcepub fn process_reader(&mut self, reader: impl Read) -> Result<usize, Error>
pub fn process_reader(&mut self, reader: impl Read) -> Result<usize, Error>
Processes an iterator of std::io::Read, creating output files as appropriate.
Sourcepub fn process_iter<T>(&mut self, records: T) -> Result<usize, Error>where
T: IntoIterator<Item = StringRecord>,
pub fn process_iter<T>(&mut self, records: T) -> Result<usize, Error>where
T: IntoIterator<Item = StringRecord>,
Iterates over every record, calculating the shard key for each, getting or creating the shard file, and writing the record.
Sourcepub fn is_shard_key_seen(&self, key: &str) -> bool
pub fn is_shard_key_seen(&self, key: &str) -> bool
Checks if key
has been seen in the processed data.
Sourcepub fn shard_keys_seen(&self) -> Vec<String>
pub fn shard_keys_seen(&self) -> Vec<String>
Returns a vec of all keys that have been seen.