1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
//! Plan containers ([`Plan`], [`PlanNode`]) and the [`RefId`] value handle.
pub use Operator;
// ============================================================================
// RefIds and plan nodes
// ============================================================================
/// Plan-scoped opaque identifier for a node's output value.
///
/// Engines must treat RefIds as opaque keys: equality and hashing are the only meaningful
/// operations.
;
/// One node in a plan: an [`Operator`], its input RefIds, and its output RefId.
///
/// `inputs` order is interpreted per [`Operator`] (e.g. for `Operator::SemiJoin` the
/// convention is `[probe, build]`). `Operator::UnionAll` emits the rows of all inputs
/// regardless of input order.
// ============================================================================
// Plans
// ============================================================================
/// A plan: an ordered sequence of [`PlanNode`]s forming a dataflow DAG.
///
/// A [`RefId`] is an opaque value handle naming the output of one plan node. Each
/// [`PlanNode`] is a triple `(op, inputs, output)`:
///
/// - `op` ([`Operator`]) is the operator: a source like `ScanParquet` or a transform like
/// `Project`.
/// - `inputs` is a `Vec<RefId>` naming the upstream values the operator reads from.
/// - `output` is the RefId the operator produces.
///
/// A node depends on another when one of its `inputs` matches that node's `output`.
/// Each RefId is bound exactly once. These input/output cross references encode the
/// dataflow DAG, and `nodes` is stored in topological order: every node appears after
/// the nodes whose outputs it consumes, so an engine can evaluate `nodes` in slice
/// order; each node's inputs are guaranteed bound by the time the node is reached.
///
/// A well-formed `Plan` has at least one node. The **terminal node** is always the
/// last entry in `nodes`: it is the only node whose `output` no other node lists in
/// `inputs`, and `Plan::result()` returns that node's `output` RefId, the value the
/// engine streams to the caller.
///
/// # Optimization
///
/// For the best performance, connectors are encouraged to run kernel-produced
/// plans through their query optimizer before execution (e.g. to fold adjacent
/// filters, merge scans over the same files, or choose physical join and scan
/// strategies).
///
/// # Example
///
/// A five-node plan: two independent scans, each filtered, then unioned. The `nodes`
/// `Vec<PlanNode>`:
///
/// ```text
/// Plan {
/// nodes: vec![
/// PlanNode { op: ScanParquet(..), inputs: vec![], output: RefId(0) },
/// PlanNode { op: ScanParquet(..), inputs: vec![], output: RefId(1) },
/// PlanNode { op: Filter(..), inputs: vec![RefId(0)], output: RefId(2) },
/// PlanNode { op: Filter(..), inputs: vec![RefId(1)], output: RefId(3) },
/// PlanNode { op: UnionAll(..), inputs: vec![RefId(2), RefId(3)], output: RefId(4) },
/// ],
/// }
/// ```
///
/// The dataflow DAG this encodes:
///
/// ```text
/// ScanParquet [RefId(0)] ScanParquet [RefId(1)]
/// | |
/// v v
/// Filter [RefId(2)] Filter [RefId(3)]
/// | |
/// +------------+-------------+
/// v
/// UnionAll [RefId(4)] <-- terminal: Plan::result() == Some(RefId(4))
/// ```
///
/// The engine evaluates the nodes in the order of the `nodes` vector: `RefId(0)` and `RefId(1)`
/// first (sources), then `RefId(2)` and `RefId(3)`, then `RefId(4)`. The engine streams the
/// rows produced at the terminal node to the caller.