Trait PhysicalExpr

Source

pub trait PhysicalExpr:
    Any
    + Send
    + Sync
    + Display
    + Debug
    + DynEq
    + DynHash {
Show 17 methods    // Required methods
    fn as_any(&self) -> &(dyn Any + 'static);
    fn evaluate(
        &self,
        batch: &RecordBatch,
    ) -> Result<ColumnarValue, DataFusionError>;
    fn children(&self) -> Vec<&Arc<dyn PhysicalExpr>>;
    fn with_new_children(
        self: Arc<Self>,
        children: Vec<Arc<dyn PhysicalExpr>>,
    ) -> Result<Arc<dyn PhysicalExpr>, DataFusionError>;
    fn fmt_sql(&self, f: &mut Formatter<'_>) -> Result<(), Error>;

    // Provided methods
    fn data_type(
        &self,
        input_schema: &Schema,
    ) -> Result<DataType, DataFusionError> { ... }
    fn nullable(&self, input_schema: &Schema) -> Result<bool, DataFusionError> { ... }
    fn return_field(
        &self,
        input_schema: &Schema,
    ) -> Result<Arc<Field>, DataFusionError> { ... }
    fn evaluate_selection(
        &self,
        batch: &RecordBatch,
        selection: &BooleanArray,
    ) -> Result<ColumnarValue, DataFusionError> { ... }
    fn evaluate_bounds(
        &self,
        _children: &[&Interval],
    ) -> Result<Interval, DataFusionError> { ... }
    fn propagate_constraints(
        &self,
        _interval: &Interval,
        _children: &[&Interval],
    ) -> Result<Option<Vec<Interval>>, DataFusionError> { ... }
    fn evaluate_statistics(
        &self,
        children: &[&Distribution],
    ) -> Result<Distribution, DataFusionError> { ... }
    fn propagate_statistics(
        &self,
        parent: &Distribution,
        children: &[&Distribution],
    ) -> Result<Option<Vec<Distribution>>, DataFusionError> { ... }
    fn get_properties(
        &self,
        _children: &[ExprProperties],
    ) -> Result<ExprProperties, DataFusionError> { ... }
    fn snapshot(&self) -> Result<Option<Arc<dyn PhysicalExpr>>, DataFusionError> { ... }
    fn snapshot_generation(&self) -> u64 { ... }
    fn is_volatile_node(&self) -> bool { ... }
}

Expand description

PhysicalExprs represent expressions such as A + 1 or CAST(c1 AS int).

PhysicalExpr knows its type, nullability and can be evaluated directly on a RecordBatch (see Self::evaluate).

PhysicalExpr are the physical counterpart to Expr used in logical planning. They are typically created from Expr by a PhysicalPlanner invoked from a higher level API

Some important examples of PhysicalExpr are:

Column: Represents a column at a given index in a RecordBatch

To create PhysicalExpr from Expr, see

SessionContext::create_physical_expr: A high level API
create_physical_expr: A low level API

§Formatting `PhysicalExpr` as strings

There are three ways to format PhysicalExpr as a string:

Debug: Standard Rust debugging format (e.g. Constant { value: ... })
Display: Detailed SQL-like format that shows expression structure (e.g. (Utf8 ("foobar")). This is often used for debugging and tests
Self::fmt_sql: SQL-like human readable format (e.g. (‘foobar’)), See also [sql_fmt`]

Required Methods§

Source

fn as_any(&self) -> &(dyn Any + 'static)

Returns the physical expression as Any so that it can be downcast to a specific implementation.

Source

fn evaluate( &self, batch: &RecordBatch, ) -> Result<ColumnarValue, DataFusionError>

Evaluate an expression against a RecordBatch

Source

fn children(&self) -> Vec<&Arc<dyn PhysicalExpr>>

Get a list of child PhysicalExpr that provide the input for this expr.

Source

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn PhysicalExpr>>, ) -> Result<Arc<dyn PhysicalExpr>, DataFusionError>

Returns a new PhysicalExpr where all children were replaced by new exprs.

Source

fn fmt_sql(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Format this PhysicalExpr in nice human readable “SQL” format

Specifically, this format is designed to be readable by humans, at the expense of details. Use Display or Debug for more detailed representation.

See the fmt_sql function for an example of printing PhysicalExprs as SQL.

Provided Methods§

Source

fn data_type(&self, input_schema: &Schema) -> Result<DataType, DataFusionError>

Get the data type of this expression, given the schema of the input

Source

fn nullable(&self, input_schema: &Schema) -> Result<bool, DataFusionError>

Determine whether this expression is nullable, given the schema of the input

Source

fn return_field( &self, input_schema: &Schema, ) -> Result<Arc<Field>, DataFusionError>

The output field associated with this expression

Source

fn evaluate_selection( &self, batch: &RecordBatch, selection: &BooleanArray, ) -> Result<ColumnarValue, DataFusionError>

Evaluate an expression against a RecordBatch after first applying a validity array

Source

fn evaluate_bounds( &self, _children: &[&Interval], ) -> Result<Interval, DataFusionError>

Computes the output interval for the expression, given the input intervals.

§Parameters

children are the intervals for the children (inputs) of this expression.

§Returns

A Result containing the output interval for the expression in case of success, or an error object in case of failure.

§Example

If the expression is a + b, and the input intervals are a: [1, 2] and b: [3, 4], then the output interval would be [4, 6].

Source

fn propagate_constraints( &self, _interval: &Interval, _children: &[&Interval], ) -> Result<Option<Vec<Interval>>, DataFusionError>

Updates bounds for child expressions, given a known interval for this expression.

This is used to propagate constraints down through an expression tree.

§Parameters

interval is the currently known interval for this expression.
children are the current intervals for the children of this expression.

§Returns

A Result containing a Vec of new intervals for the children (in order) in case of success, or an error object in case of failure.

If constraint propagation reveals an infeasibility for any child, returns None. If none of the children intervals change as a result of propagation, may return an empty vector instead of cloning children. This is the default (and conservative) return value.

§Example

If the expression is a + b, the current interval is [4, 5] and the inputs a and b are respectively given as [0, 2] and [-∞, 4], then propagation would return [0, 2] and [2, 4] as b must be at least 2 to make the output at least 4.

Source

fn evaluate_statistics( &self, children: &[&Distribution], ) -> Result<Distribution, DataFusionError>

Computes the output statistics for the expression, given the input statistics.

§Parameters

children are the statistics for the children (inputs) of this expression.

§Returns

A Result containing the output statistics for the expression in case of success, or an error object in case of failure.

Expressions (should) implement this function and utilize the independence assumption, match on children distribution types and compute the output statistics accordingly. The default implementation simply creates an unknown output distribution by combining input ranges. This logic loses distribution information, but is a safe default.

Source

fn propagate_statistics( &self, parent: &Distribution, children: &[&Distribution], ) -> Result<Option<Vec<Distribution>>, DataFusionError>

Updates children statistics using the given parent statistic for this expression.

This is used to propagate statistics down through an expression tree.

§Parameters

parent is the currently known statistics for this expression.
children are the current statistics for the children of this expression.

§Returns

A Result containing a Vec of new statistics for the children (in order) in case of success, or an error object in case of failure.

If statistics propagation reveals an infeasibility for any child, returns None. If none of the children statistics change as a result of propagation, may return an empty vector instead of cloning children. This is the default (and conservative) return value.

Expressions (should) implement this function and apply Bayes rule to reconcile and update parent/children statistics. This involves utilizing the independence assumption, and matching on distribution types. The default implementation simply creates an unknown distribution if it can narrow the range by propagating ranges. This logic loses distribution information, but is a safe default.

Source

fn get_properties( &self, _children: &[ExprProperties], ) -> Result<ExprProperties, DataFusionError>

Calculates the properties of this PhysicalExpr based on its children’s properties (i.e. order and range), recursively aggregating the information from its children. In cases where the PhysicalExpr has no children (e.g., Literal or Column), these properties should be specified externally, as the function defaults to unknown properties.

Source

fn snapshot(&self) -> Result<Option<Arc<dyn PhysicalExpr>>, DataFusionError>

Take a snapshot of this PhysicalExpr, if it is dynamic.

“Dynamic” in this case means containing references to structures that may change during plan execution, such as hash tables.

This method is used to capture the current state of PhysicalExprs that may contain dynamic references to other operators in order to serialize it over the wire or treat it via downcast matching.

You should not call this method directly as it does not handle recursion. Instead use snapshot_physical_expr to handle recursion and capture the full state of the PhysicalExpr.

This is expected to return “simple” expressions that do not have mutable state and are composed of DataFusion’s built-in PhysicalExpr implementations. Callers however should not assume anything about the returned expressions since callers and implementers may not agree on what “simple” or “built-in” means. In other words, if you need to serialize a PhysicalExpr across the wire you should call this method and then try to serialize the result, but you should handle unknown or unexpected PhysicalExpr implementations gracefully just as if you had not called this method at all.

In particular, consider:

A PhysicalExpr that references the current state of a datafusion::physical_plan::TopK that is involved in a query with SELECT * FROM t1 ORDER BY a LIMIT 10. This function may return something like a >= 12.
A PhysicalExpr that references the current state of a datafusion::physical_plan::joins::HashJoinExec from a query such as SELECT * FROM t1 JOIN t2 ON t1.a = t2.b. This function may return something like t2.b IN (1, 5, 7).

A system or function that can only deal with a hardcoded set of PhysicalExpr implementations or needs to serialize this state to bytes may not be able to handle these dynamic references. In such cases, we should return a simplified version of the PhysicalExpr that does not contain these dynamic references.

Systems that implement remote execution of plans, e.g. serialize a portion of the query plan and send it across the wire to a remote executor may want to call this method after every batch on the source side and broadcast / update the current snapshot to the remote executor.

Note for implementers: this method should not handle recursion. Recursion is handled in snapshot_physical_expr.

Source

fn snapshot_generation(&self) -> u64

Returns the generation of this PhysicalExpr for snapshotting purposes. The generation is an arbitrary u64 that can be used to track changes in the state of the PhysicalExpr over time without having to do an exhaustive comparison. This is useful to avoid unnecessary computation or serialization if there are no changes to the expression. In particular, dynamic expressions that may change over time; this allows cheap checks for changes. Static expressions that do not change over time should return 0, as does the default implementation. You should not call this method directly as it does not handle recursion. Instead use snapshot_generation to handle recursion and capture the full state of the PhysicalExpr.

Source

fn is_volatile_node(&self) -> bool

Returns true if the expression node is volatile, i.e. whether it can return different results when evaluated multiple times with the same input.

Note: unlike is_volatile, this function does not consider inputs:

random() returns true,
a + random() returns false (because the operation + itself is not volatile.)

The default to this function was set to false when it was created to avoid imposing API churn on implementers, but this is not a safe default in general. It is highly recommended that volatile expressions implement this method and return true. This default may be removed in the future if it causes problems or we decide to eat the cost of the breaking change and require all implementers to make a choice.