Skip to main content

Module optimizer_rule

Module optimizer_rule 

Source
Expand description

SamkhyaOptimizerRule — DataFusion integration point for samkhya’s cardinality corrections.

§Two-trait surface

In DataFusion 46.0 the rule implements both OptimizerRule (logical) and PhysicalOptimizerRule (physical). Cardinality information cannot be injected from a logical rule alone: DataFusion’s mainline planner does not call TableProvider::statistics() when it constructs the leaf ExecutionPlan for a scan, and the LogicalPlan::TableScan node has no slot to attach a Statistics value. The only reliable hook is the physical layer, where we wrap scan execs with SamkhyaStatsExec so their statistics() returns samkhya-corrected values that propagate up through filters, projections, and joins.

§Logical pass: observe-only

The logical-side rewrite traverses the plan and counts TableScan nodes. It does not mutate the plan (returns Transformed::no); the traversal is retained as telemetry — it exercises the corrected-stats helper end-to-end and gives downstream code a stable hook into the optimizer pass without changing the logical tree.

§Physical pass: the actual injection

The physical-side optimize walks the ExecutionPlan tree and records how many SamkhyaStatsExec leaves it observes. Those wrappers were installed by SamkhyaTableProvider::scan when the planner asked each table provider for its scan exec. The rule does not need to add wrappers itself — by the time optimize runs they are already in place — but it does validate the wiring and surfaces a name() so SessionStateBuilder::with_physical_optimizer_rule has something to register.

§Why the rule registers as a physical rule even though it doesn’t

mutate

Registering the rule against the session is the integration ceremony. It is the explicit, named contract that samkhya is in the loop, visible in EXPLAIN VERBOSE output and in the SessionState::physical_optimizers() slice. The rule’s name() is samkhya_cardinality_correction for that telemetry. Whether or not the rule physically rewrites the plan on any given query, its presence in the optimizer chain is what an operator audits to confirm samkhya is wired in.

This is the cold-start-safe posture required by samkhya’s design: the rule cannot make plans worse, only equal-or-better.

Structs§

SamkhyaOptimizerRule
DataFusion adapter rule that bridges samkhya’s corrected statistics into the optimizer.

Functions§

compute_corrected_stats
Placeholder for the Puffin-backed cardinality correction lookup.