Module datafusion::physical_plan
source · Expand description
Traits for physical query plan, supporting parallel execution for partitioned relations.
Re-exports
pub use self::metrics::Metric;
pub use display::DefaultDisplay;
pub use display::DisplayAs;
pub use display::DisplayFormatType;
pub use display::VerboseDisplay;
Modules
- Aggregates functionalities
- Defines the ANALYZE operator
- CoalesceBatchesExec combines small batches into larger batches for more efficient use of vectorized processing by upstream operators.
- Defines the merge plan for executing partitions in parallel and then merging the results into a single partition
- Defines common code used in execution plans
- Implementation of physical plan display. See
crate::physical_plan::displayable
for examples of how to format - EmptyRelation execution plan
- Defines the EXPLAIN operator
- Defines physical expressions that can evaluated at runtime during query execution
- FilterExec evaluates a boolean predicate against all input batches to determine which rows to include in its output batches.
- Declaration of built-in (scalar) functions. This module contains built-in functions’ enumeration and metadata.
- Functionality used both on logical and physical plans
- Execution plan for writing data to
DataSink
s - DataFusion Join implementations
- Defines the LIMIT plan
- Execution plan for reading in-memory batches of data
- Metrics for recording information about execution
- Defines the projection execution plan. A projection determines which columns or expressions are returned from a query. The SQL statement
SELECT a, b, a+b FROM t1
is an example of a projection on tablet1
where the expressionsa
,b
, anda+b
are the projection expressions.SELECT
withoutFROM
will only evaluate expressions. - The repartition operator maps N input partitions to M output partitions based on a partitioning scheme (according to flag
preserve_order
ordering can be preserved during repartitioning if its input is ordered). - Sort functionalities
- Stream wrappers for physical operators
- Execution plan for streaming
PartitionStream
- This module provides common traits for visiting or rewriting tree nodes easily.
- This module contains functions and structs supporting user-defined aggregate functions.
- UDF support
- The Union operator combines multiple inputs with the same schema
- Defines the unnest column plan for unnesting values in a column that contains a list type, conceptually is like joining each row with all the values in the list column.
- Values execution plan
- Physical expressions for window functions
Structs
- Statistics for a column within a relation
- EmptyRecordBatchStream can be used to create a RecordBatchStream that will produce no results
- Statistics for a relation Fields are optional and can be inexact because the sources sometimes provide approximate estimates for performance reasons and the transformations output are not always predictable.
Enums
- Represents the result of evaluating an expression: either a single
ScalarValue
or anArrayRef
. - Distribution schemes
- Partitioning schemes supported by operators.
Traits
- Describes an aggregate functions’s state.
- An aggregate expression that:
ExecutionPlan
represent nodes in the DataFusion Physical Plan.- Trait that implements the Visitor pattern for a depth first walk of
ExecutionPlan
nodes.pre_visit
is called before any children are visited, and thenpost_visit
is called after all children have been visited. To use, define a struct that implements this trait and then invoke [‘accept’]. - Expression that can be evaluated against a RecordBatch A Physical expression knows its type, nullability and how to evaluate itself.
- Trait for types that stream arrow::record_batch::RecordBatch
- Common trait for window function implementations
Functions
- Visit all children of this plan, according to the order defined on
ExecutionPlanVisitor
. - Execute the ExecutionPlan and collect the results in memory
- Execute the ExecutionPlan and collect the results in memory
- Return a wrapper around an
ExecutionPlan
which can be displayed in various easier to understand ways. - Execute the ExecutionPlan and return a single stream of results
- Execute the ExecutionPlan and return a vec with one stream per output partition
- Indicate whether a data exchange is needed for the input of
plan
, which will be very helpful especially for the distributed engine to judge whether need to deal with shuffling. Currently there are 3 kinds of execution plan which needs data exchange 1. RepartitionExec for changing the partition number between two operators 2. CoalescePartitionsExec for collapsing all of the partitions into one without ordering guarantee 3. SortPreservingMergeExec for collapsing all of the sorted partitions into one with ordering guarantee - Retrieves the ordering equivalence properties for a given schema and output ordering.
- Applies an optional projection to a
SchemaRef
, returning the projected schema - Recursively calls
pre_visit
andpost_visit
for this node and all of its children, as described onExecutionPlanVisitor
- Returns a copy of this plan if we change any child according to the pointer comparison. The size of
children
must be equal to the size ofExecutionPlan::children()
.
Type Definitions
- Trait for a [
Stream
] ofRecordBatch
es