pub trait FileSource: Send + Sync {
Show 16 methods
// Required methods
fn create_file_opener(
&self,
object_store: Arc<dyn ObjectStore>,
base_config: &FileScanConfig,
partition: usize,
) -> Result<Arc<dyn FileOpener>>;
fn as_any(&self) -> &dyn Any;
fn table_schema(&self) -> &TableSchema;
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>;
fn metrics(&self) -> &ExecutionPlanMetricsSet;
fn file_type(&self) -> &str;
// Provided methods
fn filter(&self) -> Option<Arc<dyn PhysicalExpr>> { ... }
fn projection(&self) -> Option<&ProjectionExprs> { ... }
fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result { ... }
fn supports_repartitioning(&self) -> bool { ... }
fn repartitioned(
&self,
target_partitions: usize,
repartition_file_min_size: usize,
output_ordering: Option<LexOrdering>,
config: &FileScanConfig,
) -> Result<Option<FileScanConfig>> { ... }
fn try_pushdown_filters(
&self,
filters: Vec<Arc<dyn PhysicalExpr>>,
_config: &ConfigOptions,
) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>> { ... }
fn try_reverse_output(
&self,
_order: &[PhysicalSortExpr],
_eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>> { ... }
fn try_pushdown_projection(
&self,
_projection: &ProjectionExprs,
) -> Result<Option<Arc<dyn FileSource>>> { ... }
fn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>> { ... }
fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { ... }
}Expand description
file format specific behaviors for elements in DataSource
See more details on specific implementations:
Required Methods§
Sourcefn create_file_opener(
&self,
object_store: Arc<dyn ObjectStore>,
base_config: &FileScanConfig,
partition: usize,
) -> Result<Arc<dyn FileOpener>>
fn create_file_opener( &self, object_store: Arc<dyn ObjectStore>, base_config: &FileScanConfig, partition: usize, ) -> Result<Arc<dyn FileOpener>>
Creates a dyn FileOpener based on given parameters
Sourcefn table_schema(&self) -> &TableSchema
fn table_schema(&self) -> &TableSchema
Returns the table schema for this file source.
This always returns the unprojected schema (the full schema of the data).
Sourcefn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>
Initialize new type with batch size configuration
Sourcefn metrics(&self) -> &ExecutionPlanMetricsSet
fn metrics(&self) -> &ExecutionPlanMetricsSet
Return execution plan metrics
Provided Methods§
Sourcefn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
Returns the filter expression that will be applied during the file scan.
Sourcefn projection(&self) -> Option<&ProjectionExprs>
fn projection(&self) -> Option<&ProjectionExprs>
Return the projection that will be applied to the output stream on top of the table schema.
Sourcefn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result
fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result
Format FileType specific information
Sourcefn supports_repartitioning(&self) -> bool
fn supports_repartitioning(&self) -> bool
Returns whether this file source supports repartitioning files by byte ranges.
When this returns true, files can be split into multiple partitions
based on byte offsets for parallel reading.
When this returns false, files cannot be repartitioned (e.g., CSV files
with newlines_in_values enabled cannot be split because record boundaries
cannot be determined by byte offset alone).
The default implementation returns true. File sources that cannot support
repartitioning should override this method.
Sourcefn repartitioned(
&self,
target_partitions: usize,
repartition_file_min_size: usize,
output_ordering: Option<LexOrdering>,
config: &FileScanConfig,
) -> Result<Option<FileScanConfig>>
fn repartitioned( &self, target_partitions: usize, repartition_file_min_size: usize, output_ordering: Option<LexOrdering>, config: &FileScanConfig, ) -> Result<Option<FileScanConfig>>
If supported by the FileSource, redistribute files across partitions
according to their size. Allows custom file formats to implement their
own repartitioning logic.
The default implementation uses FileGroupPartitioner. See that
struct for more details.
Sourcefn try_pushdown_filters(
&self,
filters: Vec<Arc<dyn PhysicalExpr>>,
_config: &ConfigOptions,
) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>>
fn try_pushdown_filters( &self, filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>>
Try to push down filters into this FileSource.
See ExecutionPlan::handle_child_pushdown_result for more details.
Sourcefn try_reverse_output(
&self,
_order: &[PhysicalSortExpr],
_eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
fn try_reverse_output( &self, _order: &[PhysicalSortExpr], _eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>
Try to create a new FileSource that can produce data in the specified sort order.
This method attempts to optimize data retrieval to match the requested ordering. It receives both the requested ordering and equivalence properties that describe the output data from this file source.
§Parameters
order- The requested sort ordering from the queryeq_properties- Equivalence properties of the data that will be produced by this file source. These properties describe the ordering, constant columns, and other relationships in the output data, allowing the implementation to determine if optimizations like reversed scanning can help satisfy the requested ordering. This includes information about:- The file’s natural ordering (from output_ordering in FileScanConfig)
- Constant columns (e.g., from filters like
ticker = 'AAPL') - Monotonic functions (e.g.,
extract_year_month(timestamp)) - Other equivalence relationships
§Examples
§Example 1: Simple reverse
File ordering: [a ASC, b DESC]
Requested: [a DESC]
Reversed file: [a DESC, b ASC]
Result: Satisfies request (prefix match) → Inexact§Example 2: Monotonic function
File ordering: [extract_year_month(ts) ASC, ts ASC]
Requested: [ts DESC]
Reversed file: [extract_year_month(ts) DESC, ts DESC]
Result: Through monotonicity, satisfies [ts DESC] → Inexact§Returns
Exact- Created a source that guarantees perfect orderingInexact- Created a source optimized for ordering (e.g., reversed row groups) but not perfectly sortedUnsupported- Cannot optimize for this ordering
Default implementation returns Unsupported.
Sourcefn try_pushdown_projection(
&self,
_projection: &ProjectionExprs,
) -> Result<Option<Arc<dyn FileSource>>>
fn try_pushdown_projection( &self, _projection: &ProjectionExprs, ) -> Result<Option<Arc<dyn FileSource>>>
Try to push down a projection into a this FileSource.
FileSource implementations that support projection pushdown should
override this method and return a new FileSource instance with the
projection incorporated.
If a FileSource does accept a projection it is expected to handle
the projection in it’s entirety, including partition columns.
For example, the FileSource may translate that projection into a
file format specific projection (e.g. Parquet can push down struct field access,
some other file formats like Vortex can push down computed expressions into un-decoded data)
and also need to handle partition column projection (generally done by replacing partition column
references with literal values derived from each files partition values).
Not all FileSource’s can handle complex expression pushdowns. For example,
a CSV file source may only support simple column selections. In such cases,
the FileSource can use SplitProjection and ProjectionOpener
to split the projection into a pushdownable part and a non-pushdownable part.
These helpers also handle partition column projection.
Sourcefn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>>
👎Deprecated since 52.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
fn with_schema_adapter_factory( &self, _factory: Arc<dyn SchemaAdapterFactory>, ) -> Result<Arc<dyn FileSource>>
Deprecated: Set optional schema adapter factory.
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead.
See upgrading.md for more details.
Sourcefn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
👎Deprecated since 52.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
Deprecated: Returns the current schema adapter factory if set.
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead.
See upgrading.md for more details.