Struct CsvFormat

Source
pub struct CsvFormat { /* private fields */ }
Expand description

Character Separated Value FileFormat implementation.

Implementations§

Source§

impl CsvFormat

Source

pub async fn read_to_delimited_chunks_from_stream<'a>( &self, stream: BoxStream<'a, Result<Bytes>>, ) -> BoxStream<'a, Result<Bytes>>

Convert a stream of bytes into a stream of of Bytes containing newline delimited CSV records, while accounting for \ and ".

Source

pub fn with_options(self, options: CsvOptions) -> Self

Set the csv options

Source

pub fn options(&self) -> &CsvOptions

Retrieve the csv options

Source

pub fn with_schema_infer_max_rec(self, max_rec: usize) -> Self

Set a limit in terms of records to scan to infer the schema

  • default to DEFAULT_SCHEMA_INFER_MAX_RECORD
Source

pub fn with_has_header(self, has_header: bool) -> Self

Set true to indicate that the first line is a header.

  • default to true
Source

pub fn with_null_regex(self, null_regex: Option<String>) -> Self

Set the regex to use for null values in the CSV reader.

  • default to treat empty values as null.
Source

pub fn has_header(&self) -> Option<bool>

Returns Some(true) if the first line is a header, Some(false) if it is not, and None if it is not specified.

Source

pub fn with_comment(self, comment: Option<u8>) -> Self

Lines beginning with this byte are ignored.

Source

pub fn with_delimiter(self, delimiter: u8) -> Self

The character separating values within a row.

  • default to ‘,’
Source

pub fn with_quote(self, quote: u8) -> Self

The quote character in a row.

  • default to ‘“’
Source

pub fn with_escape(self, escape: Option<u8>) -> Self

The escape character in a row.

  • default is None
Source

pub fn with_terminator(self, terminator: Option<u8>) -> Self

The character used to indicate the end of a row.

  • default to None (CRLF)
Source

pub fn with_newlines_in_values(self, newlines_in_values: bool) -> Self

Specifies whether newlines in (quoted) values are supported.

Parsing newlines in quoted values may be affected by execution behaviour such as parallel file scanning. Setting this to true ensures that newlines in values are parsed successfully, which may reduce performance.

The default behaviour depends on the datafusion.catalog.newlines_in_values setting.

Source

pub fn with_file_compression_type( self, file_compression_type: FileCompressionType, ) -> Self

Set a FileCompressionType of CSV

  • defaults to FileCompressionType::UNCOMPRESSED
Source

pub fn delimiter(&self) -> u8

The delimiter character.

Source

pub fn quote(&self) -> u8

The quote character.

Source

pub fn escape(&self) -> Option<u8>

The escape character.

Source§

impl CsvFormat

Source

pub async fn infer_schema_from_stream( &self, state: &dyn Session, records_to_read: usize, stream: impl Stream<Item = Result<Bytes>>, ) -> Result<(Schema, usize)>

Return the inferred schema reading up to records_to_read from a stream of delimited chunks returning the inferred schema and the number of lines that were read

Trait Implementations§

Source§

impl Debug for CsvFormat

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for CsvFormat

Source§

fn default() -> CsvFormat

Returns the “default value” for a type. Read more
Source§

impl FileFormat for CsvFormat

Source§

fn as_any(&self) -> &dyn Any

Returns the table provider as Any so that it can be downcast to a specific implementation.
Source§

fn get_ext(&self) -> String

Returns the extension for this FileFormat, e.g. “file.csv” -> csv
Source§

fn get_ext_with_compression( &self, file_compression_type: &FileCompressionType, ) -> Result<String>

Returns the extension for this FileFormat when compressed, e.g. “file.csv.gz” -> csv
Source§

fn infer_schema<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, state: &'life1 dyn Session, store: &'life2 Arc<dyn ObjectStore>, objects: &'life3 [ObjectMeta], ) -> Pin<Box<dyn Future<Output = Result<SchemaRef>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the common schema of the provided objects. The objects will usually be analysed up to a given number of records or files (as specified in the format config) then give the estimated common schema. This might fail if the files have schemas that cannot be merged.
Source§

fn infer_stats<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, _store: &'life2 Arc<dyn ObjectStore>, table_schema: SchemaRef, _object: &'life3 ObjectMeta, ) -> Pin<Box<dyn Future<Output = Result<Statistics>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the statistics for the provided object. The cost and accuracy of the estimated statistics might vary greatly between file formats. Read more
Source§

fn create_physical_plan<'life0, 'life1, 'life2, 'async_trait>( &'life0 self, state: &'life1 dyn Session, conf: FileScanConfig, _filters: Option<&'life2 Arc<dyn PhysicalExpr>>, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait,

Take a list of files and convert it to the appropriate executor according to this file format.
Source§

fn create_writer_physical_plan<'life0, 'life1, 'async_trait>( &'life0 self, input: Arc<dyn ExecutionPlan>, state: &'life1 dyn Session, conf: FileSinkConfig, order_requirements: Option<LexRequirement>, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Take a list of files and the configuration to convert it to the appropriate writer executor according to this file format.
Source§

fn file_source(&self) -> Arc<dyn FileSource>

Return the related FileSource such as CsvSource, JsonSource, etc.
Source§

fn supports_filters_pushdown( &self, _file_schema: &Schema, _table_schema: &Schema, _filters: &[&Expr], ) -> Result<FilePushdownSupport, DataFusionError>

Check if the specified file format has support for pushing down the provided filters within the given schemas. Added initially to support the Parquet file format’s ability to do this.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MaybeSendSync for T