Expand description

DataFusion Join implementations

Modules

  • Join related functionality used both on logical and physical plans

Structs

  • executes partitions in parallel and combines them into a set of partitions by combining all values from the left with all values on the right
  • Join execution plan executes partitions in parallel and combines them into a set of partitions.
  • NestedLoopJoinExec executes partitions in parallel. One input will be collected to a single partition, call it inner-table. The other side of the input is treated as outer-table, and the output Partitioning is from it. Giving an output partition number x, the execution will be:
  • join execution plan executes partitions in parallel and combines them into a set of partitions.
  • A symmetric hash join with range conditions is when both streams are hashed on the join key and the resulting hash tables are used to join the streams. The join is considered symmetric because the hash table is built on the join keys from both streams, and the matching of rows is based on the values of the join keys in both streams. This type of join is efficient in streaming context as it allows for fast lookups in the hash table, rather than having to scan through one or both of the streams to find matching rows, also it only considers the elements from the stream that fall within a certain sliding window (w/ range conditions), making it more efficient and less likely to store stale data. This enables operating on unbounded streaming data without any memory issues.

Enums