Expand description
SamkhyaOptimizerRule — DataFusion integration point for samkhya’s
cardinality corrections.
§Two-trait surface
In DataFusion 46.0 the rule implements both OptimizerRule
(logical) and PhysicalOptimizerRule (physical). Cardinality
information cannot be injected from a logical rule alone: DataFusion’s
mainline planner does not call TableProvider::statistics() when it
constructs the leaf ExecutionPlan for a scan, and the
LogicalPlan::TableScan node has no slot to attach a Statistics
value. The only reliable hook is the physical layer, where we wrap
scan execs with SamkhyaStatsExec so their statistics() returns
samkhya-corrected values that propagate up through filters,
projections, and joins.
§Logical pass: observe-only
The logical-side rewrite traverses the plan and counts TableScan
nodes. It does not mutate the plan (returns Transformed::no); the
traversal is retained as telemetry — it exercises the corrected-stats
helper end-to-end and gives downstream code a stable hook into the
optimizer pass without changing the logical tree.
§Physical pass: the actual injection
The physical-side optimize walks the ExecutionPlan tree and
records how many SamkhyaStatsExec leaves it observes. Those
wrappers were installed by SamkhyaTableProvider::scan when the
planner asked each table provider for its scan exec. The rule does
not need to add wrappers itself — by the time optimize runs they
are already in place — but it does validate the wiring and surfaces
a name() so SessionStateBuilder::with_physical_optimizer_rule
has something to register.
§Why the rule registers as a physical rule even though it doesn’t
mutate
Registering the rule against the session is the integration ceremony.
It is the explicit, named contract that samkhya is in the loop,
visible in EXPLAIN VERBOSE output and in the
SessionState::physical_optimizers() slice. The rule’s name() is
samkhya_cardinality_correction for that telemetry. Whether or not
the rule physically rewrites the plan on any given query, its
presence in the optimizer chain is what an operator audits to confirm
samkhya is wired in.
This is the cold-start-safe posture required by samkhya’s design: the rule cannot make plans worse, only equal-or-better.
Structs§
- Samkhya
Optimizer Rule - DataFusion adapter rule that bridges samkhya’s corrected statistics into the optimizer.
Functions§
- compute_
corrected_ stats - Placeholder for the Puffin-backed cardinality correction lookup.