Trait datafusion::physical_plan::PhysicalExpr

source ·
pub trait PhysicalExpr: Send + Sync + Display + Debug + PartialEq<dyn Any> {
    // Required methods
    fn as_any(&self) -> &(dyn Any + 'static);
    fn data_type(
        &self,
        input_schema: &Schema
    ) -> Result<DataType, DataFusionError>;
    fn nullable(&self, input_schema: &Schema) -> Result<bool, DataFusionError>;
    fn evaluate(
        &self,
        batch: &RecordBatch
    ) -> Result<ColumnarValue, DataFusionError>;
    fn children(&self) -> Vec<Arc<dyn PhysicalExpr>>;
    fn with_new_children(
        self: Arc<Self>,
        children: Vec<Arc<dyn PhysicalExpr>>
    ) -> Result<Arc<dyn PhysicalExpr>, DataFusionError>;
    fn dyn_hash(&self, _state: &mut dyn Hasher);

    // Provided methods
    fn evaluate_selection(
        &self,
        batch: &RecordBatch,
        selection: &BooleanArray
    ) -> Result<ColumnarValue, DataFusionError> { ... }
    fn evaluate_bounds(
        &self,
        _children: &[&Interval]
    ) -> Result<Interval, DataFusionError> { ... }
    fn propagate_constraints(
        &self,
        _interval: &Interval,
        _children: &[&Interval]
    ) -> Result<Option<Vec<Interval>>, DataFusionError> { ... }
    fn get_ordering(&self, _children: &[SortProperties]) -> SortProperties { ... }
}
Expand description

PhysicalExpr evaluate DataFusion expressions such as A + 1, or CAST(c1 AS int).

PhysicalExpr are the physical counterpart to Expr used in logical planning, and can be evaluated directly on a RecordBatch. They are normally created from Expr by a PhysicalPlanner and can be created directly using create_physical_expr.

A Physical expression knows its type, nullability and how to evaluate itself.

§Example: Create PhysicalExpr from Expr

// For a logical expression `a = 1`, we can create a physical expression
let expr = col("a").eq(lit(1));
// To create a PhysicalExpr we need 1. a schema
let schema = Schema::new(vec![Field::new("a", DataType::Int32, true)]);
let df_schema = DFSchema::try_from(schema).unwrap();
// 2. ExecutionProps
let props = ExecutionProps::new();
// We can now create a PhysicalExpr:
let physical_expr = create_physical_expr(&expr, &df_schema, &props).unwrap();

§Example: Executing a PhysicalExpr to obtain ColumnarValue

// Given a PhysicalExpr, for `a = 1` we can evaluate it against a RecordBatch like this:
let physical_expr = create_physical_expr(&expr, &df_schema, &props).unwrap();
// Input of [1,2,3]
let input_batch = RecordBatch::try_from_iter(vec![
  ("a", Arc::new(Int32Array::from(vec![1, 2, 3])) as _)
]).unwrap();
// The result is a ColumnarValue (either an Array or a Scalar)
let result = physical_expr.evaluate(&input_batch).unwrap();
// In this case, a BooleanArray with the result of the comparison
let ColumnarValue::Array(arr) = result else {
 panic!("Expected an array")
};
assert_eq!(arr.as_boolean(), &BooleanArray::from(vec![true, false, false]));

Required Methods§

source

fn as_any(&self) -> &(dyn Any + 'static)

Returns the physical expression as Any so that it can be downcast to a specific implementation.

source

fn data_type(&self, input_schema: &Schema) -> Result<DataType, DataFusionError>

Get the data type of this expression, given the schema of the input

source

fn nullable(&self, input_schema: &Schema) -> Result<bool, DataFusionError>

Determine whether this expression is nullable, given the schema of the input

source

fn evaluate( &self, batch: &RecordBatch ) -> Result<ColumnarValue, DataFusionError>

Evaluate an expression against a RecordBatch

source

fn children(&self) -> Vec<Arc<dyn PhysicalExpr>>

Get a list of child PhysicalExpr that provide the input for this expr.

source

fn with_new_children( self: Arc<Self>, children: Vec<Arc<dyn PhysicalExpr>> ) -> Result<Arc<dyn PhysicalExpr>, DataFusionError>

Returns a new PhysicalExpr where all children were replaced by new exprs.

source

fn dyn_hash(&self, _state: &mut dyn Hasher)

Update the hash state with this expression requirements from Hash.

This method is required to support hashing PhysicalExprs. To implement it, typically the type implementing PhysicalExpr implements Hash and then the following boiler plate is used:

§Example:
// User defined expression that derives Hash
#[derive(Hash, Debug, PartialEq, Eq)]
struct MyExpr {
  val: u64
}

// impl PhysicalExpr {
// ...
  // Boiler plate to call the derived Hash impl
  fn dyn_hash(&self, state: &mut dyn std::hash::Hasher) {
    use std::hash::Hash;
    let mut s = state;
    self.hash(&mut s);
  }
// }

Note: PhysicalExpr is not constrained by Hash directly because it must remain object safe.

Provided Methods§

source

fn evaluate_selection( &self, batch: &RecordBatch, selection: &BooleanArray ) -> Result<ColumnarValue, DataFusionError>

Evaluate an expression against a RecordBatch after first applying a validity array

source

fn evaluate_bounds( &self, _children: &[&Interval] ) -> Result<Interval, DataFusionError>

Computes the output interval for the expression, given the input intervals.

§Arguments
  • children are the intervals for the children (inputs) of this expression.
§Example

If the expression is a + b, and the input intervals are a: [1, 2] and b: [3, 4], then the output interval would be [4, 6].

source

fn propagate_constraints( &self, _interval: &Interval, _children: &[&Interval] ) -> Result<Option<Vec<Interval>>, DataFusionError>

Updates bounds for child expressions, given a known interval for this expression.

This is used to propagate constraints down through an expression tree.

§Arguments
  • interval is the currently known interval for this expression.
  • children are the current intervals for the children of this expression.
§Returns

A Vec of new intervals for the children, in order.

If constraint propagation reveals an infeasibility for any child, returns None. If none of the children intervals change as a result of propagation, may return an empty vector instead of cloning children. This is the default (and conservative) return value.

§Example

If the expression is a + b, the current interval is [4, 5] and the inputs a and b are respectively given as [0, 2] and [-∞, 4], then propagation would would return [0, 2] and [2, 4] as b must be at least 2 to make the output at least 4.

source

fn get_ordering(&self, _children: &[SortProperties]) -> SortProperties

The order information of a PhysicalExpr can be estimated from its children. This is especially helpful for projection expressions. If we can ensure that the order of a PhysicalExpr to project matches with the order of SortExec, we can eliminate that SortExecs.

By recursively calling this function, we can obtain the overall order information of the PhysicalExpr. Since SortOptions cannot fully handle the propagation of unordered columns and literals, the SortProperties struct is used.

Trait Implementations§

source§

impl DynTreeNode for dyn PhysicalExpr

source§

fn arc_children(&self) -> Vec<Arc<dyn PhysicalExpr>>

Returns all children of the specified TreeNode.
source§

fn with_new_arc_children( &self, arc_self: Arc<dyn PhysicalExpr>, new_children: Vec<Arc<dyn PhysicalExpr>> ) -> Result<Arc<dyn PhysicalExpr>, DataFusionError>

Constructs a new node with the specified children.
source§

impl Hash for dyn PhysicalExpr

source§

fn hash<H>(&self, state: &mut H)
where H: Hasher,

Feeds this value into the given Hasher. Read more

Implementors§