FileSource

Trait FileSource 

Source
pub trait FileSource: Send + Sync {
Show 16 methods // Required methods fn create_file_opener( &self, object_store: Arc<dyn ObjectStore>, base_config: &FileScanConfig, partition: usize, ) -> Result<Arc<dyn FileOpener>>; fn as_any(&self) -> &dyn Any; fn table_schema(&self) -> &TableSchema; fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>; fn metrics(&self) -> &ExecutionPlanMetricsSet; fn file_type(&self) -> &str; // Provided methods fn filter(&self) -> Option<Arc<dyn PhysicalExpr>> { ... } fn projection(&self) -> Option<&ProjectionExprs> { ... } fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result { ... } fn supports_repartitioning(&self) -> bool { ... } fn repartitioned( &self, target_partitions: usize, repartition_file_min_size: usize, output_ordering: Option<LexOrdering>, config: &FileScanConfig, ) -> Result<Option<FileScanConfig>> { ... } fn try_pushdown_filters( &self, filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>> { ... } fn try_reverse_output( &self, _order: &[PhysicalSortExpr], _eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>> { ... } fn try_pushdown_projection( &self, _projection: &ProjectionExprs, ) -> Result<Option<Arc<dyn FileSource>>> { ... } fn with_schema_adapter_factory( &self, _factory: Arc<dyn SchemaAdapterFactory>, ) -> Result<Arc<dyn FileSource>> { ... } fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>> { ... }
}
Expand description

file format specific behaviors for elements in DataSource

See more details on specific implementations:

Required Methods§

Source

fn create_file_opener( &self, object_store: Arc<dyn ObjectStore>, base_config: &FileScanConfig, partition: usize, ) -> Result<Arc<dyn FileOpener>>

Creates a dyn FileOpener based on given parameters

Source

fn as_any(&self) -> &dyn Any

Any

Source

fn table_schema(&self) -> &TableSchema

Returns the table schema for this file source.

This always returns the unprojected schema (the full schema of the data).

Source

fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>

Initialize new type with batch size configuration

Source

fn metrics(&self) -> &ExecutionPlanMetricsSet

Return execution plan metrics

Source

fn file_type(&self) -> &str

String representation of file source such as “csv”, “json”, “parquet”

Provided Methods§

Source

fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>

Returns the filter expression that will be applied during the file scan.

Source

fn projection(&self) -> Option<&ProjectionExprs>

Return the projection that will be applied to the output stream on top of the table schema.

Source

fn fmt_extra(&self, _t: DisplayFormatType, _f: &mut Formatter<'_>) -> Result

Format FileType specific information

Source

fn supports_repartitioning(&self) -> bool

Returns whether this file source supports repartitioning files by byte ranges.

When this returns true, files can be split into multiple partitions based on byte offsets for parallel reading.

When this returns false, files cannot be repartitioned (e.g., CSV files with newlines_in_values enabled cannot be split because record boundaries cannot be determined by byte offset alone).

The default implementation returns true. File sources that cannot support repartitioning should override this method.

Source

fn repartitioned( &self, target_partitions: usize, repartition_file_min_size: usize, output_ordering: Option<LexOrdering>, config: &FileScanConfig, ) -> Result<Option<FileScanConfig>>

If supported by the FileSource, redistribute files across partitions according to their size. Allows custom file formats to implement their own repartitioning logic.

The default implementation uses FileGroupPartitioner. See that struct for more details.

Source

fn try_pushdown_filters( &self, filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> Result<FilterPushdownPropagation<Arc<dyn FileSource>>>

Try to push down filters into this FileSource. See ExecutionPlan::handle_child_pushdown_result for more details.

Source

fn try_reverse_output( &self, _order: &[PhysicalSortExpr], _eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>>

Try to create a new FileSource that can produce data in the specified sort order.

This method attempts to optimize data retrieval to match the requested ordering. It receives both the requested ordering and equivalence properties that describe the output data from this file source.

§Parameters
  • order - The requested sort ordering from the query
  • eq_properties - Equivalence properties of the data that will be produced by this file source. These properties describe the ordering, constant columns, and other relationships in the output data, allowing the implementation to determine if optimizations like reversed scanning can help satisfy the requested ordering. This includes information about:
    • The file’s natural ordering (from output_ordering in FileScanConfig)
    • Constant columns (e.g., from filters like ticker = 'AAPL')
    • Monotonic functions (e.g., extract_year_month(timestamp))
    • Other equivalence relationships
§Examples
§Example 1: Simple reverse
File ordering: [a ASC, b DESC]
Requested:     [a DESC]
Reversed file: [a DESC, b ASC]
Result: Satisfies request (prefix match) → Inexact
§Example 2: Monotonic function
File ordering: [extract_year_month(ts) ASC, ts ASC]
Requested:     [ts DESC]
Reversed file: [extract_year_month(ts) DESC, ts DESC]
Result: Through monotonicity, satisfies [ts DESC] → Inexact
§Returns
  • Exact - Created a source that guarantees perfect ordering
  • Inexact - Created a source optimized for ordering (e.g., reversed row groups) but not perfectly sorted
  • Unsupported - Cannot optimize for this ordering

Default implementation returns Unsupported.

Source

fn try_pushdown_projection( &self, _projection: &ProjectionExprs, ) -> Result<Option<Arc<dyn FileSource>>>

Try to push down a projection into a this FileSource.

FileSource implementations that support projection pushdown should override this method and return a new FileSource instance with the projection incorporated.

If a FileSource does accept a projection it is expected to handle the projection in it’s entirety, including partition columns. For example, the FileSource may translate that projection into a file format specific projection (e.g. Parquet can push down struct field access, some other file formats like Vortex can push down computed expressions into un-decoded data) and also need to handle partition column projection (generally done by replacing partition column references with literal values derived from each files partition values).

Not all FileSource’s can handle complex expression pushdowns. For example, a CSV file source may only support simple column selections. In such cases, the FileSource can use SplitProjection and ProjectionOpener to split the projection into a pushdownable part and a non-pushdownable part. These helpers also handle partition column projection.

Source

fn with_schema_adapter_factory( &self, _factory: Arc<dyn SchemaAdapterFactory>, ) -> Result<Arc<dyn FileSource>>

👎Deprecated since 52.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.

Deprecated: Set optional schema adapter factory.

SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.

Source

fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>

👎Deprecated since 52.0.0: SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.

Deprecated: Returns the current schema adapter factory if set.

SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.

Implementors§