Struct datafusion::datasource::physical_plan::FileScanConfig
source · pub struct FileScanConfig {
pub object_store_url: ObjectStoreUrl,
pub file_schema: SchemaRef,
pub file_groups: Vec<Vec<PartitionedFile>>,
pub statistics: Statistics,
pub projection: Option<Vec<usize>>,
pub limit: Option<usize>,
pub table_partition_cols: Vec<(String, DataType)>,
pub output_ordering: Vec<LexOrdering>,
pub infinite_source: bool,
}Expand description
The base configurations to provide when creating a physical plan for any given file format.
Fields§
§object_store_url: ObjectStoreUrlObject store URL, used to get an ObjectStore instance from
RuntimeEnv::object_store
file_schema: SchemaRefSchema before projection is applied. It contains the all columns that may
appear in the files. It does not include table partition columns
that may be added.
file_groups: Vec<Vec<PartitionedFile>>List of files to be processed, grouped into partitions
Each file must have a schema of file_schema or a subset. If
a particular file has a subset, the missing columns are
padded with NULLs.
DataFusion may attempt to read each partition of files concurrently, however files within a partition will be read sequentially, one after the next.
statistics: StatisticsEstimated overall statistics of the files, taking filters into account.
projection: Option<Vec<usize>>Columns on which to project the data. Indexes that are higher than the
number of columns of file_schema refer to table_partition_cols.
limit: Option<usize>The maximum number of records to read from this plan. If None,
all records after filtering are returned.
table_partition_cols: Vec<(String, DataType)>The partitioning columns
output_ordering: Vec<LexOrdering>All equivalent lexicographical orderings that describe the schema.
infinite_source: boolIndicates whether this plan may produce an infinite stream of records.
Implementations§
source§impl FileScanConfig
impl FileScanConfig
sourcepub fn project(&self) -> (SchemaRef, Statistics, Vec<LexOrdering>)
pub fn project(&self) -> (SchemaRef, Statistics, Vec<LexOrdering>)
Project the schema and the statistics on the given column indices
sourcepub fn repartition_file_groups(
file_groups: Vec<Vec<PartitionedFile>>,
target_partitions: usize,
repartition_file_min_size: usize
) -> Option<Vec<Vec<PartitionedFile>>>
pub fn repartition_file_groups( file_groups: Vec<Vec<PartitionedFile>>, target_partitions: usize, repartition_file_min_size: usize ) -> Option<Vec<Vec<PartitionedFile>>>
Repartition all input files into target_partitions partitions, if total file size exceed
repartition_file_min_size
target_partitions and repartition_file_min_size directly come from configuration.
This function only try to partition file byte range evenly, and let specific FileOpener to
do actual partition on specific data source type. (e.g. CsvOpener will only read lines
overlap with byte range but also handle boundaries to ensure all lines will be read exactly once)
Trait Implementations§
source§impl Clone for FileScanConfig
impl Clone for FileScanConfig
source§fn clone(&self) -> FileScanConfig
fn clone(&self) -> FileScanConfig
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more