Expand description
Distributed join execution: broadcast and shuffle joins across cluster nodes.
Broadcast join: Control Plane serializes the small side (< 8 MiB), sends it to all relevant nodes via QUIC transport. Each node performs a local hash join with its local large-side data.
Shuffle join: each node scans its local data, hashes on the join key, routes rows to the owning node via QUIC transport. The target node performs a local hash join on the repartitioned data.
Both strategies use Arrow IPC for zero-copy batched data movement.
Structs§
- Broadcast
Join Request - A broadcast join request sent to each participating node.
- Shuffle
Partition - A shuffle join partition assignment.
Enums§
- Join
Side - Join
Strategy - Selected distributed join strategy.
Constants§
- DEFAULT_
BROADCAST_ THRESHOLD_ BYTES - Default maximum payload size for broadcast join strategy selection.
Functions§
- estimate_
collection_ bytes - Estimate the serialized size of a collection’s data.
- partition_
for_ key - Compute which node owns a given partition (based on join key hash).
- plan_
shuffle_ partitions - Plan the node assignments for a shuffle join.
- select_
strategy - Determine the join strategy based on estimated data sizes.