pub struct FileScanConfig {
    pub object_store_url: ObjectStoreUrl,
    pub file_schema: SchemaRef,
    pub file_groups: Vec<Vec<PartitionedFile>>,
    pub statistics: Statistics,
    pub projection: Option<Vec<usize>>,
    pub limit: Option<usize>,
    pub table_partition_cols: Vec<(String, DataType)>,
    pub output_ordering: Vec<LexOrdering>,
    pub infinite_source: bool,
}
Expand description

The base configurations to provide when creating a physical plan for any given file format.

Fields§

§object_store_url: ObjectStoreUrl

Object store URL, used to get an ObjectStore instance from RuntimeEnv::object_store

§file_schema: SchemaRef

Schema before projection is applied. It contains the all columns that may appear in the files. It does not include table partition columns that may be added.

§file_groups: Vec<Vec<PartitionedFile>>

List of files to be processed, grouped into partitions

Each file must have a schema of file_schema or a subset. If a particular file has a subset, the missing columns are padded with NULLs.

DataFusion may attempt to read each partition of files concurrently, however files within a partition will be read sequentially, one after the next.

§statistics: Statistics

Estimated overall statistics of the files, taking filters into account.

§projection: Option<Vec<usize>>

Columns on which to project the data. Indexes that are higher than the number of columns of file_schema refer to table_partition_cols.

§limit: Option<usize>

The maximum number of records to read from this plan. If None, all records after filtering are returned.

§table_partition_cols: Vec<(String, DataType)>

The partitioning columns

§output_ordering: Vec<LexOrdering>

All equivalent lexicographical orderings that describe the schema.

§infinite_source: bool

Indicates whether this plan may produce an infinite stream of records.

Implementations§

source§

impl FileScanConfig

source

pub fn project(&self) -> (SchemaRef, Statistics, Vec<LexOrdering>)

Project the schema and the statistics on the given column indices

source

pub fn repartition_file_groups( file_groups: Vec<Vec<PartitionedFile>>, target_partitions: usize, repartition_file_min_size: usize ) -> Option<Vec<Vec<PartitionedFile>>>

Repartition all input files into target_partitions partitions, if total file size exceed repartition_file_min_size target_partitions and repartition_file_min_size directly come from configuration.

This function only try to partition file byte range evenly, and let specific FileOpener to do actual partition on specific data source type. (e.g. CsvOpener will only read lines overlap with byte range but also handle boundaries to ensure all lines will be read exactly once)

Trait Implementations§

source§

impl Clone for FileScanConfig

source§

fn clone(&self) -> FileScanConfig

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for FileScanConfig

source§

fn fmt(&self, f: &mut Formatter<'_>) -> FmtResult

Formats the value using the given formatter. Read more
source§

impl DisplayAs for FileScanConfig

source§

fn fmt_as(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> FmtResult

Format according to DisplayFormatType, used when verbose representation looks different from the default one Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Same<T> for T

§

type Output = T

Should always be Self
source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for Twhere V: MultiLane<T>,

§

fn vzip(self) -> V