pub trait FileSource: Send + Sync {
Show 17 methods
// Required methods
fn create_file_opener(
&self,
object_store: Arc<dyn ObjectStore>,
base_config: &FileScanConfig,
partition: usize,
) -> Result<Arc<dyn FileOpener>>;
fn as_any(&self) -> &dyn Any;
fn table_schema(&self) -> &TableSchema;
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>;
fn metrics(&self) -> &ExecutionPlanMetricsSet;
fn file_type(&self) -> &str;
// Provided methods
fn filter(&self) -> Option<Arc<dyn PhysicalExpr>> { ... }
fn projection(&self) -> Option<&ProjectionExprs> { ... }
fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result { ... }
fn supports_repartitioning(&self) -> bool { ... }
fn repartitioned(
&self,
target_partitions: usize,
repartition_file_min_size: usize,
output_ordering: Option<LexOrdering>,
config: &FileScanConfig,
) -> Result<Option<FileScanConfig>> { ... }
fn try_pushdown_filters(
&self,
filters: Vec<Arc<dyn PhysicalExpr>>,
_config: &ConfigOptions,
) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>> { ... }
fn try_pushdown_sort(
&self,
order: &[PhysicalSortExpr],
eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>> { ... }
fn try_reverse_output(
&self,
_order: &[PhysicalSortExpr],
_eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>> { ... }
fn try_pushdown_projection(
&self,
_projection: &ProjectionExprs,
) -> Result<Option<Arc<dyn FileSource>>> { ... }
fn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>> { ... }
fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { ... }
}Expand description
File format specific behaviors for DataSource
§Schema information
There are two important schemas for a FileSource:
Self::table_schema– the schema for the overall table (file data plus partition columns)- The logical output schema, comprised of
Self::table_schemawithSelf::projectionapplied
See more details on specific implementations:
Required Methods§
Sourcefn create_file_opener(
&self,
object_store: Arc<dyn ObjectStore>,
base_config: &FileScanConfig,
partition: usize,
) -> Result<Arc<dyn FileOpener>>
fn create_file_opener( &self, object_store: Arc<dyn ObjectStore>, base_config: &FileScanConfig, partition: usize, ) -> Result<Arc<dyn FileOpener>>
Creates a dyn FileOpener based on given parameters
Sourcefn table_schema(&self) -> &TableSchema
fn table_schema(&self) -> &TableSchema
Returns the table schema for the overall table (including partition columns, if any)
This method returns the unprojected schema: the full schema of the data
without Self::projection applied.
The output schema of this FileSource is this TableSchema
with Self::projection applied.
Use ProjectionExprs::project_schema to get the projected schema
after applying the projection.
Sourcefn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>
Initialize new type with batch size configuration
Sourcefn metrics(&self) -> &ExecutionPlanMetricsSet
fn metrics(&self) -> &ExecutionPlanMetricsSet
Return execution plan metrics
Provided Methods§
Sourcefn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
Returns the filter expression that will be applied during the file scan.
These expressions are in terms of the unprojected Self::table_schema.
Sourcefn projection(&self) -> Option<&ProjectionExprs>
fn projection(&self) -> Option<&ProjectionExprs>
Return the projection that will be applied to the output stream on top
of Self::table_schema.
Note you can use ProjectionExprs::project_schema on the table
schema to get the effective output schema of this source.
Sourcefn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result
fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result
Format FileType specific information
Sourcefn supports_repartitioning(&self) -> bool
fn supports_repartitioning(&self) -> bool
Returns whether this file source supports repartitioning files by byte ranges.
When this returns true, files can be split into multiple partitions
based on byte offsets for parallel reading.
When this returns false, files cannot be repartitioned (e.g., CSV files
with newlines_in_values enabled cannot be split because record boundaries
cannot be determined by byte offset alone).
The default implementation returns true. File sources that cannot support
repartitioning should override this method.
Sourcefn repartitioned(
&self,
target_partitions: usize,
repartition_file_min_size: usize,
output_ordering: Option<LexOrdering>,
config: &FileScanConfig,
) -> Result<Option<FileScanConfig>>
fn repartitioned( &self, target_partitions: usize, repartition_file_min_size: usize, output_ordering: Option<LexOrdering>, config: &FileScanConfig, ) -> Result<Option<FileScanConfig>>
If supported by the FileSource, redistribute files across partitions
according to their size. Allows custom file formats to implement their
own repartitioning logic.
The default implementation uses FileGroupPartitioner. See that
struct for more details.
Sourcefn try_pushdown_filters(
&self,
filters: Vec<Arc<dyn PhysicalExpr>>,
_config: &ConfigOptions,
) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>>
fn try_pushdown_filters( &self, filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>>
Try to push down filters into this FileSource.
filters must be in terms of the unprojected table schema (file schema
plus partition columns), before any projection is applied.
Any filters that this FileSource chooses to evaluate itself should be
returned as PushedDown::Yes in the result, along with a FileSource
instance that incorporates those filters. Such filters are logically
applied “during” the file scan, meaning they may refer to columns not
included in the final output projection.
Filters that cannot be pushed down should be marked as PushedDown::No,
and will be evaluated by an execution plan after the file source.
See ExecutionPlan::handle_child_pushdown_result for more details.
Sourcefn try_pushdown_sort(
&self,
order: &[PhysicalSortExpr],
eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
fn try_pushdown_sort( &self, order: &[PhysicalSortExpr], eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
Try to create a new FileSource that can produce data in the specified sort order.
This method attempts to optimize data retrieval to match the requested ordering. It receives both the requested ordering and equivalence properties that describe the output data from this file source.
§Parameters
order- The requested sort ordering from the queryeq_properties- Equivalence properties of the data that will be produced by this file source. These properties describe the ordering, constant columns, and other relationships in the output data, allowing the implementation to determine if optimizations like reversed scanning can help satisfy the requested ordering. This includes information about:- The file’s natural ordering (from output_ordering in FileScanConfig)
- Constant columns (e.g., from filters like
ticker = 'AAPL') - Monotonic functions (e.g.,
extract_year_month(timestamp)) - Other equivalence relationships
§Examples
§Example 1: Simple reverse
File ordering: [a ASC, b DESC]
Requested: [a DESC]
Reversed file: [a DESC, b ASC]
Result: Satisfies request (prefix match) → Inexact§Example 2: Monotonic function
File ordering: [extract_year_month(ts) ASC, ts ASC]
Requested: [ts DESC]
Reversed file: [extract_year_month(ts) DESC, ts DESC]
Result: Through monotonicity, satisfies [ts DESC] → Inexact§Returns
Exact- Created a source that guarantees perfect orderingInexact- Created a source optimized for ordering (e.g., reversed row groups) but not perfectly sortedUnsupported- Cannot optimize for this ordering
§Deprecation / migration notes
Self::try_reverse_outputwas renamed to this method and deprecated since53.0.0. Per DataFusion’s deprecation guidelines, it will be removed in59.0.0or later (6 major versions or 6 months, whichever is longer).- New implementations should override
Self::try_pushdown_sortdirectly. - For backwards compatibility, the default implementation of
Self::try_pushdown_sortdelegates to the deprecatedSelf::try_reverse_outputuntil it is removed. After that point, the default implementation will returnSortOrderPushdownResult::Unsupported.
Sourcefn try_reverse_output(
&self,
_order: &[PhysicalSortExpr],
_eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
👎Deprecated since 53.0.0: Renamed to try_pushdown_sort. This method was never limited to reversing output. It will be removed in 59.0.0 or later.
fn try_reverse_output( &self, _order: &[PhysicalSortExpr], _eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
Renamed to try_pushdown_sort. This method was never limited to reversing output. It will be removed in 59.0.0 or later.
Deprecated: Renamed to Self::try_pushdown_sort.
Sourcefn try_pushdown_projection(
&self,
_projection: &ProjectionExprs,
) -> Result<Option<Arc<dyn FileSource>>>
fn try_pushdown_projection( &self, _projection: &ProjectionExprs, ) -> Result<Option<Arc<dyn FileSource>>>
Try to push down a projection into this FileSource.
FileSource implementations that support projection pushdown should
override this method and return a new FileSource instance with the
projection incorporated.
If a FileSource does accept a projection it is expected to handle
the projection in it’s entirety, including partition columns.
For example, the FileSource may translate that projection into a
file format specific projection (e.g. Parquet can push down struct field access,
some other file formats like Vortex can push down computed expressions into un-decoded data)
and also need to handle partition column projection (generally done by replacing partition column
references with literal values derived from each files partition values).
Not all FileSource’s can handle complex expression pushdowns. For example,
a CSV file source may only support simple column selections. In such cases,
the FileSource can use SplitProjection and ProjectionOpener
to split the projection into a pushdownable part and a non-pushdownable part.
These helpers also handle partition column projection.
Sourcefn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>>
👎Deprecated since 53.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
fn with_schema_adapter_factory( &self, _factory: Arc<dyn SchemaAdapterFactory>, ) -> Result<Arc<dyn FileSource>>
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
Deprecated: Set optional schema adapter factory.
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead.
See upgrading.md for more details.
Sourcefn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
👎Deprecated since 53.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
Deprecated: Returns the current schema adapter factory if set.
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead.
See upgrading.md for more details.