Skip to main content

samkhya_datafusion/
lib.rs

1//! samkhya-datafusion — DataFusion adapter for samkhya-core.
2//!
3//! # Integration model
4//!
5//! DataFusion 46.0 has two distinct surfaces that the mainline planner
6//! actually consults for cardinality:
7//!
8//! * [`ExecutionPlan::statistics()`] on each node of the physical plan.
9//!   `FilterExec`, `ProjectionExec`, `HashJoinExec` etc. propagate from
10//!   their child's `statistics()` upward, so corrections placed at the
11//!   leaf scan level reach the top of the plan.
12//! * [`TableProvider::statistics()`] is **not** consulted by the
13//!   mainline physical planner: it is reserved for downstream forks /
14//!   custom optimizer rules. We still implement it for completeness, but
15//!   we do not rely on it as the injection path.
16//!
17//! samkhya therefore wires corrections in at three layers, which together
18//! form the integration model:
19//!
20//! 1. [`SamkhyaTableProvider`] —
21//!    a `TableProvider` wrapper that delegates every method to an inner
22//!    provider but overrides `statistics()` with samkhya-corrected
23//!    [`ColumnStatistics`], and — critically — overrides `scan()` to
24//!    return a physical [`SamkhyaStatsExec`]
25//!    wrapping the inner provider's exec. The exec wrapper is what
26//!    makes `physical.statistics()?.num_rows` reflect samkhya's
27//!    corrections, because the mainline planner uses the exec's stats,
28//!    not the table provider's.
29//! 2. [`SamkhyaStatsExec`] — a
30//!    passthrough [`ExecutionPlan`] that overrides `statistics()` to
31//!    return a preset `Statistics`, delegating every other method to the
32//!    inner exec. This is the physical-layer hook the planner actually
33//!    consults.
34//! 3. [`SamkhyaOptimizerRule`] —
35//!    implements both `OptimizerRule` (logical, observe-only) and
36//!    `PhysicalOptimizerRule` (physical, validates the wrappers are in
37//!    place and surfaces a diagnostic count of `SamkhyaStatsExec`
38//!    leaves seen). Registration of the rule is the explicit integration
39//!    ceremony — operators audit the
40//!    `SessionState::physical_optimizers()` slice to confirm samkhya is
41//!    wired in.
42//!
43//! ```ignore
44//! use std::sync::Arc;
45//! use datafusion::execution::session_state::SessionStateBuilder;
46//! use datafusion::execution::context::SessionContext;
47//! use datafusion::prelude::SessionConfig;
48//! use samkhya_datafusion::{SamkhyaOptimizerRule, SamkhyaTableProvider};
49//! use samkhya_core::stats::ColumnStats;
50//!
51//! let rule = Arc::new(SamkhyaOptimizerRule::new());
52//! let state = SessionStateBuilder::new()
53//!     .with_config(SessionConfig::new())
54//!     .with_default_features()
55//!     .with_optimizer_rule(rule.clone())
56//!     .with_physical_optimizer_rule(rule.clone())
57//!     .build();
58//! let ctx = SessionContext::new_with_state(state);
59//!
60//! let wrapped = SamkhyaTableProvider::new(inner_provider)
61//!     .with_column_stats(0, ColumnStats::new().with_row_count(1_000_000));
62//! ctx.register_table("t", Arc::new(wrapped))?;
63//! ```
64//!
65//! All values translated into DataFusion's `Precision<T>` are marked
66//! [`Precision::Inexact`] — samkhya's corrections are feedback-driven,
67//! clamped by the LpBound pessimistic ceiling, and never exact catalog
68//! counts. This is the conservative posture the safety envelope requires.
69//!
70//! # Compatibility
71//!
72//! Compiled and tested against **DataFusion 46.0.1** (released March 2025).
73//! Version 46 is the first release with a stable `OptimizerRule` trait
74//! surface (`name`, `apply_order`, `supports_rewrite`, `rewrite`), the
75//! `PhysicalOptimizerRule` trait, and the `Precision<T>` /
76//! `ColumnStatistics` / `Statistics` types we depend on for cardinality
77//! correction. Newer versions should also work, with any signature drift
78//! caught by the `wrap_provider` integration test and the
79//! `stats_propagation_demo` example binary.
80//!
81//! [`OptimizerRule`]: datafusion::optimizer::OptimizerRule
82//! [`PhysicalOptimizerRule`]: datafusion::physical_optimizer::PhysicalOptimizerRule
83//! [`TableProvider`]: datafusion::datasource::TableProvider
84//! [`TableProvider::statistics()`]: datafusion::datasource::TableProvider::statistics
85//! [`ExecutionPlan`]: datafusion::physical_plan::ExecutionPlan
86//! [`ExecutionPlan::statistics()`]: datafusion::physical_plan::ExecutionPlan::statistics
87//! [`ColumnStatistics`]: datafusion::common::ColumnStatistics
88//! [`Precision::Inexact`]: datafusion::common::stats::Precision::Inexact
89#![deny(rustdoc::broken_intra_doc_links)]
90
91pub mod optimizer_rule;
92pub mod physical_plan;
93pub mod stats_provider;
94pub mod table_provider;
95
96pub use optimizer_rule::SamkhyaOptimizerRule;
97pub use physical_plan::SamkhyaStatsExec;
98pub use stats_provider::to_datafusion_column_statistics;
99pub use table_provider::SamkhyaTableProvider;