ParquetFormat

Struct ParquetFormat 

Source
pub struct ParquetFormat { /* private fields */ }
Expand description

The Apache Parquet FileFormat implementation

Implementations§

Source§

impl ParquetFormat

Source

pub fn new() -> Self

Construct a new Format with no local overrides

Source

pub fn with_enable_pruning(self, enable: bool) -> Self

Activate statistics based row group level pruning

  • If None, defaults to value on config_options
Source

pub fn enable_pruning(&self) -> bool

Return true if pruning is enabled

Source

pub fn with_metadata_size_hint(self, size_hint: Option<usize>) -> Self

Provide a hint to the size of the file metadata. If a hint is provided the reader will try and fetch the last size_hint bytes of the parquet file optimistically. Without a hint, two read are required. One read to fetch the 8-byte parquet footer and then another read to fetch the metadata length encoded in the footer.

  • If None, defaults to value on config_options
Source

pub fn metadata_size_hint(&self) -> Option<usize>

Return the metadata size hint if set

Source

pub fn with_skip_metadata(self, skip_metadata: bool) -> Self

Tell the parquet reader to skip any metadata that may be in the file Schema. This can help avoid schema conflicts due to metadata.

  • If None, defaults to value on config_options
Source

pub fn skip_metadata(&self) -> bool

Returns true if schema metadata will be cleared prior to schema merging.

Source

pub fn with_options(self, options: TableParquetOptions) -> Self

Set Parquet options for the ParquetFormat

Source

pub fn options(&self) -> &TableParquetOptions

Parquet options

Source

pub fn force_view_types(&self) -> bool

Return true if should use view types.

If this returns true, DataFusion will instruct the parquet reader to read string / binary columns using view StringView or BinaryView if the table schema specifies those types, regardless of any embedded metadata that may specify an alternate Arrow type. The parquet reader is optimized for reading StringView and BinaryView and such queries are significantly faster.

If this returns false, the parquet reader will read the columns according to the defaults or any embedded Arrow type information. This may result in reading StringArrays and then casting to StringViewArray which is less efficient.

Source

pub fn with_force_view_types(self, use_views: bool) -> Self

If true, will use view types. See Self::force_view_types for details

Source

pub fn binary_as_string(&self) -> bool

Return true if binary types will be read as strings.

If this returns true, DataFusion will instruct the parquet reader to read binary columns such as Binary or BinaryView as the corresponding string type such as Utf8 or LargeUtf8. The parquet reader has special optimizations for Utf8 and LargeUtf8 validation, and such queries are significantly faster than reading binary columns and then casting to string columns.

Source

pub fn with_binary_as_string(self, binary_as_string: bool) -> Self

If true, will read binary types as strings. See Self::binary_as_string for details

Source

pub fn coerce_int96(&self) -> Option<String>

Source

pub fn with_coerce_int96(self, time_unit: Option<String>) -> Self

Trait Implementations§

Source§

impl Debug for ParquetFormat

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for ParquetFormat

Source§

fn default() -> ParquetFormat

Returns the “default value” for a type. Read more
Source§

impl FileFormat for ParquetFormat

Source§

fn as_any(&self) -> &dyn Any

Returns the table provider as Any so that it can be downcast to a specific implementation.
Source§

fn get_ext(&self) -> String

Returns the extension for this FileFormat, e.g. “file.csv” -> csv
Source§

fn get_ext_with_compression( &self, file_compression_type: &FileCompressionType, ) -> Result<String>

Returns the extension for this FileFormat when compressed, e.g. “file.csv.gz” -> csv
Source§

fn compression_type(&self) -> Option<FileCompressionType>

Returns whether this instance uses compression if applicable
Source§

fn infer_schema<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, state: &'life1 dyn Session, store: &'life2 Arc<dyn ObjectStore>, objects: &'life3 [ObjectMeta], ) -> Pin<Box<dyn Future<Output = Result<SchemaRef>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the common schema of the provided objects. The objects will usually be analysed up to a given number of records or files (as specified in the format config) then give the estimated common schema. This might fail if the files have schemas that cannot be merged.
Source§

fn infer_stats<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, store: &'life2 Arc<dyn ObjectStore>, table_schema: SchemaRef, object: &'life3 ObjectMeta, ) -> Pin<Box<dyn Future<Output = Result<Statistics>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Infer the statistics for the provided object. The cost and accuracy of the estimated statistics might vary greatly between file formats. Read more
Source§

fn create_physical_plan<'life0, 'life1, 'async_trait>( &'life0 self, _state: &'life1 dyn Session, conf: FileScanConfig, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Take a list of files and convert it to the appropriate executor according to this file format.
Source§

fn create_writer_physical_plan<'life0, 'life1, 'async_trait>( &'life0 self, input: Arc<dyn ExecutionPlan>, _state: &'life1 dyn Session, conf: FileSinkConfig, order_requirements: Option<LexRequirement>, ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Take a list of files and the configuration to convert it to the appropriate writer executor according to this file format.
Source§

fn file_source(&self) -> Arc<dyn FileSource>

Return the related FileSource such as CsvSource, JsonSource, etc.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Allocation for T
where T: RefUnwindSafe + Send + Sync,

Source§

impl<T> ErasedDestructor for T
where T: 'static,