Skip to main content

Module table_provider

Module table_provider 

Source
Expand description

SamkhyaTableProvider — the primary integration point for injecting samkhya-corrected column statistics into DataFusion’s query planning.

§Wrapping point: TableProvider::statistics()

DataFusion attaches statistics to table providers, not to logical-plan nodes. The TableProvider trait exposes a statistics() hook (returning Option<Statistics>) that the planner consults when reasoning about cardinality, join order, and filter selectivity. Rewriting a LogicalPlan to “inject” stats is the wrong layer — that is observe-only plumbing. The right layer is a TableProvider shim that delegates every method to an inner provider except statistics(), where it folds in samkhya’s feedback-driven corrections.

We considered three wrapping points and chose the first:

  1. TableProvider::statistics() (this module). Clean, stable surface in DataFusion 46. The planner calls it during analysis. Every adapter (Parquet, CSV, MemTable, Iceberg) flows through the same hook, so the shim is provider-agnostic.
  2. ExecutionPlan::statistics(). Lower in the stack — would require wrapping the scan-side ExecutionPlan returned from scan(). Useful when the inner provider’s logical stats are absent but its physical plan has them; not our situation today.
  3. OptimizerRule rewriting TableScan::source. The original scaffold direction. The rewrite must construct a new TableSource (the logical counterpart of TableProvider) — duplicate state, version-fragile, and never propagates into the physical layer where the planner actually consults stats. Kept around as observe-only telemetry (crate::SamkhyaOptimizerRule).

§LpBound posture

Every value translated into DataFusion’s Precision<T> is wrapped as Precision::Inexact. samkhya’s corrections are feedback-driven estimates clamped by the LpBound pessimistic ceiling; they are never exact catalog counts. Inexact is the precision DataFusion’s cost-based optimizer treats as “use this, but do not assume zero error”.

Structs§

SamkhyaTableProvider
A TableProvider wrapper that overrides statistics() with samkhya-corrected column statistics while delegating every other method to the inner provider.