pub struct CsvSource<Out: Data + for<'a> Deserialize<'a>> { /* private fields */ }
Expand description
Source that reads and parses a CSV file.
The file is divided in chunks and is read concurrently by multiple replicas.
Implementations§
Source§impl<Out: Data + for<'a> Deserialize<'a>> CsvSource<Out>
impl<Out: Data + for<'a> Deserialize<'a>> CsvSource<Out>
Sourcepub fn new<P: Into<PathBuf>>(path: P) -> Self
pub fn new<P: Into<PathBuf>>(path: P) -> Self
Create a new source that reads and parse the lines of a CSV file.
The file is partitioned into as many chunks as replicas, each replica has to have the same file in the same path. It is guaranteed that each line of the file is emitted by exactly one replica.
After creating the source it’s possible to customize its behaviour using one of the
available methods. By default it is assumed that the delimiter is ,
and the CSV has
headers.
Each line will be deserialized into the type Out
, so the structure of the CSV must be
valid for that deserialization. The csv
crate is used for
the parsing.
Note: the file must be readable and its size must be available. This means that only regular files can be read.
§Example
#[derive(Clone, Deserialize, Serialize)]
struct Thing {
what: String,
count: u64,
}
let source = CsvSource::<Thing>::new("/datasets/huge.csv");
let s = env.stream(source);
Sourcepub fn comment(self, comment: Option<u8>) -> Self
pub fn comment(self, comment: Option<u8>) -> Self
The comment character to use when parsing CSV.
If the start of a record begins with the byte given here, then that line is ignored by the CSV parser.
This is disabled by default.
Sourcepub fn delimiter(self, delimiter: u8) -> Self
pub fn delimiter(self, delimiter: u8) -> Self
The field delimiter to use when parsing CSV.
The default is ,
.
Sourcepub fn double_quote(self, double_quote: bool) -> Self
pub fn double_quote(self, double_quote: bool) -> Self
Enable double quote escapes.
This is enabled by default, but it may be disabled. When disabled, doubled quotes are not interpreted as escapes.
Sourcepub fn escape(self, escape: Option<u8>) -> Self
pub fn escape(self, escape: Option<u8>) -> Self
The escape character to use when parsing CSV.
In some variants of CSV, quotes are escaped using a special escape character like \
(instead of escaping quotes by doubling them).
By default, recognizing these idiosyncratic escapes is disabled.
Sourcepub fn flexible(self, flexible: bool) -> Self
pub fn flexible(self, flexible: bool) -> Self
Whether the number of fields in records is allowed to change or not.
When disabled (which is the default), parsing CSV data will return an error if a record is found with a number of fields different from the number of fields in a previous record.
When enabled, this error checking is turned off.
Sourcepub fn quote(self, quote: u8) -> Self
pub fn quote(self, quote: u8) -> Self
The quote character to use when parsing CSV.
The default is "
.
Sourcepub fn quoting(self, quoting: bool) -> Self
pub fn quoting(self, quoting: bool) -> Self
Enable or disable quoting.
This is enabled by default, but it may be disabled. When disabled, quotes are not treated specially.
Sourcepub fn terminator(self, terminator: Terminator) -> Self
pub fn terminator(self, terminator: Terminator) -> Self
The record terminator to use when parsing CSV.
A record terminator can be any single byte. The default is a special value,
Terminator::CRLF
, which treats any occurrence of \r
, \n
or \r\n
as a single record
terminator.
Sourcepub fn trim(self, trim: Trim) -> Self
pub fn trim(self, trim: Trim) -> Self
Whether fields are trimmed of leading and trailing whitespace or not.
By default, no trimming is performed. This method permits one to override that behavior and choose one of the following options:
Trim::Headers
trims only header values.Trim::Fields
trims only non-header or “field” values.Trim::All
trims both header and non-header values.
A value is only interpreted as a header value if this CSV reader is configured to read a header record (which is the default).
When reading string records, characters meeting the definition of Unicode whitespace are
trimmed. When reading byte records, characters meeting the definition of ASCII whitespace
are trimmed. ASCII whitespace characters correspond to the set [\t\n\v\f\r ]
.
Sourcepub fn has_headers(self, has_headers: bool) -> Self
pub fn has_headers(self, has_headers: bool) -> Self
Whether to treat the first row as a special header row.
By default, the first row is treated as a special header row, which means the header is
never returned by any of the record reading methods or iterators. When this is disabled
(yes
set to false
), the first row is not treated specially.
Note that the headers
and byte_headers
methods are unaffected by whether this is set.
Those methods always return the first record.
Trait Implementations§
Source§impl<Out: Data + for<'a> Deserialize<'a>> Operator for CsvSource<Out>
impl<Out: Data + for<'a> Deserialize<'a>> Operator for CsvSource<Out>
type Out = Out
Source§fn setup(&mut self, metadata: &mut ExecutionMetadata<'_>)
fn setup(&mut self, metadata: &mut ExecutionMetadata<'_>)
next
and it’s used to
initialize the operator. When it’s called the operator has already been cloned and it will
never be cloned again. Therefore it’s safe to store replica-specific metadata inside of it. Read moreSource§fn next(&mut self) -> StreamElement<Out>
fn next(&mut self) -> StreamElement<Out>
Source§fn structure(&self) -> BlockStructure
fn structure(&self) -> BlockStructure
Source§impl<Out: Data + for<'a> Deserialize<'a>> Source for CsvSource<Out>
impl<Out: Data + for<'a> Deserialize<'a>> Source for CsvSource<Out>
Source§fn replication(&self) -> Replication
fn replication(&self) -> Replication
Auto Trait Implementations§
impl<Out> Freeze for CsvSource<Out>
impl<Out> RefUnwindSafe for CsvSource<Out>where
Out: RefUnwindSafe,
impl<Out> Send for CsvSource<Out>
impl<Out> Sync for CsvSource<Out>where
Out: Sync,
impl<Out> Unpin for CsvSource<Out>where
Out: Unpin,
impl<Out> UnwindSafe for CsvSource<Out>where
Out: UnwindSafe,
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> CloneableStorage for T
impl<T> CloneableStorage for T
fn clone_storage(&self) -> Box<dyn CloneableStorage>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more