pub trait TableProvider:
Any
+ Debug
+ Sync
+ Send {
Show 14 methods
// Required methods
fn schema(&self) -> SchemaRef;
fn table_type(&self) -> TableType;
fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
projection: Option<&'life2 Vec<usize>>,
filters: &'life3 [Expr],
limit: Option<usize>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
'life2: 'async_trait,
'life3: 'async_trait;
// Provided methods
fn constraints(&self) -> Option<&Constraints> { ... }
fn get_table_definition(&self) -> Option<&str> { ... }
fn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>> { ... }
fn get_column_default(&self, _column: &str) -> Option<&Expr> { ... }
fn scan_with_args<'a, 'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
args: ScanArgs<'a>,
) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>
where Self: 'async_trait,
'a: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
fn supports_filters_pushdown(
&self,
filters: &[&Expr],
) -> Result<Vec<TableProviderFilterPushDown>> { ... }
fn statistics(&self) -> Option<Statistics> { ... }
fn insert_into<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_input: Arc<dyn ExecutionPlan>,
_insert_op: InsertOp,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
fn delete_from<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
fn update<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_assignments: Vec<(String, Expr)>,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
fn truncate<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>
where Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait { ... }
}Expand description
A table which can be queried and modified.
Please see CatalogProvider for details of implementing a custom catalog.
TableProvider represents a source of data which can provide data as
Apache Arrow RecordBatches. Implementations of this trait provide
important information for planning such as:
Self::schema: The schema (columns and their types) of the tableSelf::supports_filters_pushdown: Should filters be pushed into this scanSelf::scan: AnExecutionPlanthat can read data
Required Methods§
Sourcefn table_type(&self) -> TableType
fn table_type(&self) -> TableType
Get the type of this table for metadata/catalog purposes.
Sourcefn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
projection: Option<&'life2 Vec<usize>>,
filters: &'life3 [Expr],
limit: Option<usize>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
'life2: 'async_trait,
'life3: 'async_trait,
fn scan<'life0, 'life1, 'life2, 'life3, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
projection: Option<&'life2 Vec<usize>>,
filters: &'life3 [Expr],
limit: Option<usize>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
'life2: 'async_trait,
'life3: 'async_trait,
Create an ExecutionPlan for scanning the table with optional
projection, filter, and limit, described below.
The returned ExecutionPlan is responsible for scanning the datasource’s
partitions in a streaming, parallelized fashion.
§Projection
If specified, only a subset of columns should be returned, in the order
specified. The projection is a set of indexes of the fields in
Self::schema.
DataFusion provides the projection so the scan reads only the columns actually used in the query, an optimization called “Projection Pushdown”. Some datasources, such as Parquet, can use this information to go significantly faster when only a subset of columns is required.
§Filters
A list of boolean filter Exprs to evaluate during the scan, in the
manner specified by Self::supports_filters_pushdown. Only rows for
which all of the Exprs evaluate to true must be returned (that is,
the expressions are ANDed together).
To enable filter pushdown, override
Self::supports_filters_pushdown. The default implementation does not
push down filters, and filters will be empty.
DataFusion pushes filters into scans whenever possible (“Filter Pushdown”). Depending on the data format and implementation, evaluating predicates during the scan can significantly improve performance.
§Note: Some columns may appear only in Filters
In some cases, a query may use a column only in a filter and the projection will not contain all columns referenced by the filter expressions.
For example, given the query SELECT t.a FROM t WHERE t.b > 5,
┌────────────────────┐
│ Projection(t.a) │
└────────────────────┘
▲
│
│
┌────────────────────┐ Filter ┌────────────────────┐ Projection ┌────────────────────┐
│ Filter(t.b > 5) │────Pushdown──▶ │ Projection(t.a) │ ───Pushdown───▶ │ Projection(t.a) │
└────────────────────┘ └────────────────────┘ └────────────────────┘
▲ ▲ ▲
│ │ │
│ │ ┌────────────────────┐
┌────────────────────┐ ┌────────────────────┐ │ Scan │
│ Scan │ │ Scan │ │ filter=(t.b > 5) │
└────────────────────┘ │ filter=(t.b > 5) │ │ projection=(t.a) │
└────────────────────┘ └────────────────────┘
Initial Plan If `TableProviderFilterPushDown` Projection pushdown notes that
returns true, filter pushdown the scan only needs t.a
pushes the filter into the scan
BUT internally evaluating the
predicate still requires t.b§Limit
If limit is specified, the scan must produce at least this many
rows, though it may return more. Like Projection Pushdown and Filter
Pushdown, DataFusion pushes LIMITs as far down in the plan as
possible. This is called “Limit Pushdown”, and some sources can use the
information to improve performance.
Note: If any pushed-down filters are Inexact, the LIMIT cannot be
pushed down. Inexact filters do not guarantee that every filtered row is
removed, so applying the limit could leave too few rows to return in the
final result.
§Evaluation Order
The logical evaluation order is filters, then limit, then
projection.
Note that limit applies to the filtered result, not to the unfiltered
input, and projection affects only which columns are returned, not
which rows qualify.
For example, if a scan receives:
projection = [a]filters = [b > 5]limit = Some(3)
It must logically produce results equivalent to:
PROJECTION a (LIMIT 3 (SCAN WHERE b > 5))As noted above, columns referenced only by pushed-down filters may be
absent from projection.
Provided Methods§
Sourcefn constraints(&self) -> Option<&Constraints>
fn constraints(&self) -> Option<&Constraints>
Get a reference to the constraints of the table. Returns:
Nonefor tables that do not support constraints.Some(&Constraints)for tables supporting constraints. Therefore, aSome(&Constraints::empty())return value indicates that this table supports constraints, but there are no constraints.
Sourcefn get_table_definition(&self) -> Option<&str>
fn get_table_definition(&self) -> Option<&str>
Get the create statement used to create this table, if available.
Sourcefn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>>
fn get_logical_plan(&self) -> Option<Cow<'_, LogicalPlan>>
Get the LogicalPlan of this table, if available.
Sourcefn get_column_default(&self, _column: &str) -> Option<&Expr>
fn get_column_default(&self, _column: &str) -> Option<&Expr>
Get the default value for a column, if available.
Sourcefn scan_with_args<'a, 'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
args: ScanArgs<'a>,
) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>where
Self: 'async_trait,
'a: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn scan_with_args<'a, 'life0, 'life1, 'async_trait>(
&'life0 self,
state: &'life1 dyn Session,
args: ScanArgs<'a>,
) -> Pin<Box<dyn Future<Output = Result<ScanResult>> + Send + 'async_trait>>where
Self: 'async_trait,
'a: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Create an ExecutionPlan for scanning the table using structured arguments.
This method uses ScanArgs to pass scan parameters in a structured way
and returns a ScanResult containing the execution plan.
Table providers can override this method to take advantage of additional
parameters like the upcoming preferred_ordering that may not be available through
other scan methods.
§Arguments
state- The session state containing configuration and contextargs- Structured scan arguments including projection, filters, limit, and ordering preferences
§Returns
A ScanResult containing the ExecutionPlan for scanning the table
See Self::scan for detailed documentation about projection, filters, and limits.
Sourcefn supports_filters_pushdown(
&self,
filters: &[&Expr],
) -> Result<Vec<TableProviderFilterPushDown>>
fn supports_filters_pushdown( &self, filters: &[&Expr], ) -> Result<Vec<TableProviderFilterPushDown>>
Specify if DataFusion should provide filter expressions to the TableProvider to apply during the scan.
Some TableProviders can evaluate filters more efficiently than the
Filter operator in DataFusion, for example by using an index.
§Parameters and Return Value
The return Vec must have one element for each element of the filters
argument. The value of each element indicates if the TableProvider can
apply the corresponding filter during the scan. The position in the return
value corresponds to the expression in the filters parameter.
If the length of the resulting Vec does not match the filters input
an error will be thrown.
Each element in the resulting Vec is one of the following:
ExactorInexact: The TableProvider can apply the filter during scanUnsupported: The TableProvider cannot apply the filter during scan
By default, this function returns Unsupported for all filters,
meaning no filters will be provided to Self::scan.
§Example
// Define a struct that implements the TableProvider trait
#[derive(Debug)]
struct TestDataSource {}
#[async_trait]
impl TableProvider for TestDataSource {
todo!()
// Override the supports_filters_pushdown to evaluate which expressions
// to accept as pushdown predicates.
fn supports_filters_pushdown(&self, filters: &[&Expr]) -> Result<Vec<TableProviderFilterPushDown>> {
// Process each filter
let support: Vec<_> = filters.iter().map(|expr| {
match expr {
// This example only supports a between expr with a single column named "c1".
Expr::Between(between_expr) => {
between_expr.expr
.try_as_col()
.map(|column| {
if column.name == "c1" {
TableProviderFilterPushDown::Exact
} else {
TableProviderFilterPushDown::Unsupported
}
})
// If there is no column in the expr set the filter to unsupported.
.unwrap_or(TableProviderFilterPushDown::Unsupported)
}
_ => {
// For all other cases return Unsupported.
TableProviderFilterPushDown::Unsupported
}
}
}).collect();
Ok(support)
}
}Sourcefn statistics(&self) -> Option<Statistics>
fn statistics(&self) -> Option<Statistics>
Get statistics for this table, if available Although not presently used in mainline DataFusion, this allows implementation specific behavior for downstream repositories, in conjunction with specialized optimizer rules to perform operations such as re-ordering of joins.
Sourcefn insert_into<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_input: Arc<dyn ExecutionPlan>,
_insert_op: InsertOp,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn insert_into<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_input: Arc<dyn ExecutionPlan>,
_insert_op: InsertOp,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Return an ExecutionPlan to insert data into this table, if
supported.
The returned plan should return a single row in a UInt64 column called “count” such as the following
+-------+,
| count |,
+-------+,
| 6 |,
+-------+,§See Also
See DataSinkExec for the common pattern of inserting a
streams of RecordBatches as files to an ObjectStore.
Sourcefn delete_from<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn delete_from<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Delete rows matching the filter predicates.
Returns an ExecutionPlan producing a single row with count (UInt64).
Empty filters deletes all rows.
Sourcefn update<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_assignments: Vec<(String, Expr)>,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn update<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
_assignments: Vec<(String, Expr)>,
_filters: Vec<Expr>,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Update rows matching the filter predicates.
Returns an ExecutionPlan producing a single row with count (UInt64).
Empty filters updates all rows.
Sourcefn truncate<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
fn truncate<'life0, 'life1, 'async_trait>(
&'life0 self,
_state: &'life1 dyn Session,
) -> Pin<Box<dyn Future<Output = Result<Arc<dyn ExecutionPlan>>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
'life1: 'async_trait,
Remove all rows from the table.
Should return an ExecutionPlan producing a single row with count (UInt64), representing the number of rows removed.
Implementations§
Source§impl dyn TableProvider
impl dyn TableProvider
Sourcepub fn is<T: TableProvider>(&self) -> bool
pub fn is<T: TableProvider>(&self) -> bool
Returns true if the table provider is of type T.
Prefer this over downcast_ref::<T>().is_some(). Works correctly when
called on Arc<dyn TableProvider> via auto-deref.
Sourcepub fn downcast_ref<T: TableProvider>(&self) -> Option<&T>
pub fn downcast_ref<T: TableProvider>(&self) -> Option<&T>
Attempts to downcast this table provider to a concrete type T,
returning None if the provider is not of that type.
Works correctly when called on Arc<dyn TableProvider> via auto-deref,
unlike (&arc as &dyn Any).downcast_ref::<T>() which would attempt to
downcast the Arc itself.
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".