Module selectivity

Expand description

v6.2.2 — selectivity estimation over per-column statistics.

Each selectivity function returns a fraction in [0.0, 1.0] — the planner multiplies these against row_count to get estimated input cardinality for each operator. v6.2.3 JOIN reorder consumes these estimates; v6.2.4 EXPLAIN ANALYZE surfaces them alongside the actual-rows count.

Defaults follow PG’s “no-stats” guesses so a freshly-loaded table without a prior ANALYZE still gets a plausible plan:

DEFAULT_EQ = 0.005 — PG’s DEFAULT_EQ_SEL
DEFAULT_RANGE = 0.333 — PG’s DEFAULT_INEQ_SEL
DEFAULT_BETWEEN = 0.005 — narrower than range; matches PG for BETWEEN x AND y without stats
DEFAULT_LIKE = 0.005 — PG’s DEFAULT_MATCH_SEL

Histogram walks use a binary-search-based “fraction ≤ value” primitive (fraction_le_value), giving us range estimation in O(log n_buckets) per call. Equality keys off n_distinct when the value lands inside the histogram range; out-of-range values get an extrapolation cap so OUT-OF-RANGE predicates don’t collapse to zero (which would make the planner pick degenerate plans like cross-products).

Constants§

DEFAULT_BETWEEN: PG’s default for col BETWEEN a AND b without stats.
DEFAULT_EQ: PG’s default selectivity for col = constant when no histogram is available. v6.2.x can re-tune.
DEFAULT_LIKE: PG’s default for col LIKE 'prefix%' without stats.
DEFAULT_RANGE: PG’s default for col <= / < / >= / > constant without stats.

Functions§

between: col BETWEEN low AND high — convenience for the inclusive double-bounded shape. Equivalent to range with both bounds set and inclusive.
equal: col = value. With stats, returns (1 / n_distinct) × (1 - null_frac) when value lies in the histogram range, else scales down by an order of magnitude for out-of-range extrapolation. Without stats, returns DEFAULT_EQ.
in_list: col IN (v1, v2, …). Sums per-value equality selectivities, clamped at 1.0. Without stats, returns DEFAULT_EQ × len(values) (also clamped) — the same shape PG would produce.
like_prefix: col LIKE 'prefix%' (or any single-prefix anchored pattern). With stats, estimates as range(prefix, prefix + "\u{FFFF}") on the assumption the column’s natural ordering is a prefix order (Text lex). Without stats, DEFAULT_LIKE.
range: col >= low AND col <= high (with both bounds optional). When low is None the lower side is open at −∞; same for high and +∞. lo_incl / hi_incl control whether the boundary itself is included (currently a near-no-op since selectivity estimation is approximate at the boundary, but kept in the signature so the planner can pass the parser’s intent through).