Struct CsvSource

Source
pub struct CsvSource<Out: Data + for<'a> Deserialize<'a>> { /* private fields */ }
Expand description

Source that reads and parses a CSV file.

The file is divided in chunks and is read concurrently by multiple replicas.

Implementations§

Source§

impl<Out: Data + for<'a> Deserialize<'a>> CsvSource<Out>

Source

pub fn new<P: Into<PathBuf>>(path: P) -> Self

Create a new source that reads and parse the lines of a CSV file.

The file is partitioned into as many chunks as replicas, each replica has to have the same file in the same path. It is guaranteed that each line of the file is emitted by exactly one replica.

After creating the source it’s possible to customize its behaviour using one of the available methods. By default it is assumed that the delimiter is , and the CSV has headers.

Each line will be deserialized into the type Out, so the structure of the CSV must be valid for that deserialization. The csv crate is used for the parsing.

Note: the file must be readable and its size must be available. This means that only regular files can be read.

§Example
#[derive(Clone, Deserialize, Serialize)]
struct Thing {
    what: String,
    count: u64,
}
let source = CsvSource::<Thing>::new("/datasets/huge.csv");
let s = env.stream(source);
Source

pub fn comment(self, comment: Option<u8>) -> Self

The comment character to use when parsing CSV.

If the start of a record begins with the byte given here, then that line is ignored by the CSV parser.

This is disabled by default.

Source

pub fn delimiter(self, delimiter: u8) -> Self

The field delimiter to use when parsing CSV.

The default is ,.

Source

pub fn double_quote(self, double_quote: bool) -> Self

Enable double quote escapes.

This is enabled by default, but it may be disabled. When disabled, doubled quotes are not interpreted as escapes.

Source

pub fn escape(self, escape: Option<u8>) -> Self

The escape character to use when parsing CSV.

In some variants of CSV, quotes are escaped using a special escape character like \ (instead of escaping quotes by doubling them).

By default, recognizing these idiosyncratic escapes is disabled.

Source

pub fn flexible(self, flexible: bool) -> Self

Whether the number of fields in records is allowed to change or not.

When disabled (which is the default), parsing CSV data will return an error if a record is found with a number of fields different from the number of fields in a previous record.

When enabled, this error checking is turned off.

Source

pub fn quote(self, quote: u8) -> Self

The quote character to use when parsing CSV.

The default is ".

Source

pub fn quoting(self, quoting: bool) -> Self

Enable or disable quoting.

This is enabled by default, but it may be disabled. When disabled, quotes are not treated specially.

Source

pub fn terminator(self, terminator: Terminator) -> Self

The record terminator to use when parsing CSV.

A record terminator can be any single byte. The default is a special value, Terminator::CRLF, which treats any occurrence of \r, \n or \r\n as a single record terminator.

Source

pub fn trim(self, trim: Trim) -> Self

Whether fields are trimmed of leading and trailing whitespace or not.

By default, no trimming is performed. This method permits one to override that behavior and choose one of the following options:

  1. Trim::Headers trims only header values.
  2. Trim::Fields trims only non-header or “field” values.
  3. Trim::All trims both header and non-header values.

A value is only interpreted as a header value if this CSV reader is configured to read a header record (which is the default).

When reading string records, characters meeting the definition of Unicode whitespace are trimmed. When reading byte records, characters meeting the definition of ASCII whitespace are trimmed. ASCII whitespace characters correspond to the set [\t\n\v\f\r ].

Source

pub fn has_headers(self, has_headers: bool) -> Self

Whether to treat the first row as a special header row.

By default, the first row is treated as a special header row, which means the header is never returned by any of the record reading methods or iterators. When this is disabled (yes set to false), the first row is not treated specially.

Note that the headers and byte_headers methods are unaffected by whether this is set. Those methods always return the first record.

Trait Implementations§

Source§

impl<Out: Data + for<'a> Deserialize<'a>> Clone for CsvSource<Out>

Source§

fn clone(&self) -> Self

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<Out: Data + for<'a> Deserialize<'a>> Display for CsvSource<Out>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<Out: Data + for<'a> Deserialize<'a>> Operator for CsvSource<Out>

Source§

type Out = Out

Source§

fn setup(&mut self, metadata: &mut ExecutionMetadata<'_>)

Setup the operator chain. This is called before any call to next and it’s used to initialize the operator. When it’s called the operator has already been cloned and it will never be cloned again. Therefore it’s safe to store replica-specific metadata inside of it. Read more
Source§

fn next(&mut self) -> StreamElement<Out>

Take a value from the previous operator, process it and return it.
Source§

fn structure(&self) -> BlockStructure

A more refined representation of the operator and its predecessors.
Source§

impl<Out: Data + for<'a> Deserialize<'a>> Source for CsvSource<Out>

Source§

fn replication(&self) -> Replication

The maximum parallelism offered by this operator.

Auto Trait Implementations§

§

impl<Out> Freeze for CsvSource<Out>

§

impl<Out> RefUnwindSafe for CsvSource<Out>
where Out: RefUnwindSafe,

§

impl<Out> Send for CsvSource<Out>

§

impl<Out> Sync for CsvSource<Out>
where Out: Sync,

§

impl<Out> Unpin for CsvSource<Out>
where Out: Unpin,

§

impl<Out> UnwindSafe for CsvSource<Out>
where Out: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> CloneableStorage for T
where T: Any + Send + Sync + Clone,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T> ToString for T
where T: Display + ?Sized,

Source§

fn to_string(&self) -> String

Converts the given value to a String. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> Data for T
where T: Clone + Send + 'static,