pub trait TableProvider: Sync + Send {
    // Required methods
    fn as_any(&self) -> &dyn Any;
    fn schema(&self) -> SchemaRef;
    fn table_type(&self) -> TableType;
    fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
        &'life0 self,
        state: &'life1 SessionState,
        projection: Option<&'life2 Vec<usize>>,
        filters: &'life3 [Expr],
        limit: Option<usize>
    ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
       where Self: 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait,
             'life2: 'async_trait,
             'life3: 'async_trait;

    // Provided methods
    fn constraints(&self) -> Option<&Constraints> { ... }
    fn get_table_definition(&self) -> Option<&str> { ... }
    fn get_logical_plan(&self) -> Option<&LogicalPlan> { ... }
    fn get_column_default(&self, _column: &str) -> Option<&Expr> { ... }
    fn supports_filter_pushdown(
        &self,
        _filter: &Expr
    ) -> Result<TableProviderFilterPushDown> { ... }
    fn supports_filters_pushdown(
        &self,
        filters: &[&Expr]
    ) -> Result<Vec<TableProviderFilterPushDown>> { ... }
    fn statistics(&self) -> Option<Statistics> { ... }
    fn insert_into<'life0, 'life1, 'async_trait>(
        &'life0 self,
        _state: &'life1 SessionState,
        _input: Arc<dyn ExecutionPlan>,
        _overwrite: bool
    ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
       where Self: 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait { ... }
}
Expand description

Source table

Required Methods§

source

fn as_any(&self) -> &dyn Any

Returns the table provider as Any so that it can be downcast to a specific implementation.

source

fn schema(&self) -> SchemaRef

Get a reference to the schema for this table

source

fn table_type(&self) -> TableType

Get the type of this table for metadata/catalog purposes.

source

fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>( &'life0 self, state: &'life1 SessionState, projection: Option<&'life2 Vec<usize>>, filters: &'life3 [Expr], limit: Option<usize> ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait, 'life2: 'async_trait, 'life3: 'async_trait,

Create an ExecutionPlan for scanning the table with optionally specified projection, filter and limit, described below.

The ExecutionPlan is responsible scanning the datasource’s partitions in a streaming, parallelized fashion.

§Projection

If specified, only a subset of columns should be returned, in the order specified. The projection is a set of indexes of the fields in Self::schema.

DataFusion provides the projection to scan only the columns actually used in the query to improve performance, an optimization called “Projection Pushdown”. Some datasources, such as Parquet, can use this information to go significantly faster when only a subset of columns is required.

§Filters

A list of boolean filter Exprs to evaluate during the scan, in the manner specified by Self::supports_filters_pushdown. Only rows for which all of the Exprs evaluate to true must be returned (aka the expressions are ANDed together).

DataFusion pushes filtering into the scans whenever possible (“Filter Pushdown”), and depending on the format and the implementation of the format, evaluating the predicate during the scan can increase performance significantly.

§Note: Some columns may appear only in Filters

In certain cases, a query may only use a certain column in a Filter that has been completely pushed down to the scan. In this case, the projection will not contain all the columns found in the filter expressions.

For example, given the query SELECT t.a FROM t WHERE t.b > 5,

┌────────────────────┐
│  Projection(t.a)   │
└────────────────────┘
           ▲
           │
           │
┌────────────────────┐     Filter     ┌────────────────────┐   Projection    ┌────────────────────┐
│  Filter(t.b > 5)   │────Pushdown──▶ │  Projection(t.a)   │ ───Pushdown───▶ │  Projection(t.a)   │
└────────────────────┘                └────────────────────┘                 └────────────────────┘
           ▲                                     ▲                                      ▲
           │                                     │                                      │
           │                                     │                           ┌────────────────────┐
┌────────────────────┐                ┌────────────────────┐                 │        Scan        │
│        Scan        │                │        Scan        │                 │  filter=(t.b > 5)  │
└────────────────────┘                │  filter=(t.b > 5)  │                 │  projection=(t.a)  │
                                      └────────────────────┘                 └────────────────────┘

Initial Plan                  If `TableProviderFilterPushDown`           Projection pushdown notes that
                              returns true, filter pushdown              the scan only needs t.a
                              pushes the filter into the scan
                                                                         BUT internally evaluating the
                                                                         predicate still requires t.b
§Limit

If limit is specified, must only produce at least this many rows, (though it may return more). Like Projection Pushdown and Filter Pushdown, DataFusion pushes LIMITs as far down in the plan as possible, called “Limit Pushdown” as some sources can use this information to improve their performance. Note that if there are any Inexact filters pushed down, the LIMIT cannot be pushed down. This is because inexact filters do not guarantee that every filtered row is removed, so applying the limit could lead to too few rows being available to return as a final result.

Provided Methods§

source

fn constraints(&self) -> Option<&Constraints>

Get a reference to the constraints of the table. Returns:

  • None for tables that do not support constraints.
  • Some(&Constraints) for tables supporting constraints. Therefore, a Some(&Constraints::empty()) return value indicates that this table supports constraints, but there are no constraints.
source

fn get_table_definition(&self) -> Option<&str>

Get the create statement used to create this table, if available.

source

fn get_logical_plan(&self) -> Option<&LogicalPlan>

Get the LogicalPlan of this table, if available

source

fn get_column_default(&self, _column: &str) -> Option<&Expr>

Get the default value for a column, if available.

source

fn supports_filter_pushdown( &self, _filter: &Expr ) -> Result<TableProviderFilterPushDown>

👎Deprecated since 20.0.0: use supports_filters_pushdown instead

Tests whether the table provider can make use of a filter expression to optimise data retrieval.

source

fn supports_filters_pushdown( &self, filters: &[&Expr] ) -> Result<Vec<TableProviderFilterPushDown>>

Tests whether the table provider can make use of any or all filter expressions to optimise data retrieval.

source

fn statistics(&self) -> Option<Statistics>

Get statistics for this table, if available

source

fn insert_into<'life0, 'life1, 'async_trait>( &'life0 self, _state: &'life1 SessionState, _input: Arc<dyn ExecutionPlan>, _overwrite: bool ) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Return an ExecutionPlan to insert data into this table, if supported.

The returned plan should return a single row in a UInt64 column called “count” such as the following

+-------+,
| count |,
+-------+,
| 6     |,
+-------+,
§See Also

See FileSinkExec for the common pattern of inserting a streams of RecordBatches as files to an ObjectStore.

Implementors§