Skip to main content

Module statistics

Module statistics 

Source
Expand description

v6.2.0 — per-column statistics for the cost-based optimizer.

Each analysed (table, column) pair gets a ColumnStats row: null_frac ∈ [0.0, 1.0], n_distinct count, and a 100-bucket equi-depth histogram (Vec<String> of 101 bounds — v0 .. v100). Skewed distributions live in the bucket widths, not in a separate MCV sidecar (see V6_2_DESIGN.md deliberation #1).

Storage shape mirrors crate::publications::Publications and crate::subscriptions::Subscriptions:

  • BTreeMap<(String, String), ColumnStats> keeps iteration in deterministic alphabetical order (snapshot byte-stable regardless of insertion sequence).
  • BTreeMap<String, u64> tracks per-table modified-row count for v6.2.1 auto-analyze’s 10 % threshold trigger.

Persistence rides the snapshot envelope’s v5 trailer block (see crate::lib::build_envelope). v1/v2/v3/v4 envelopes deserialise to empty statistics; v5 writers always emit the trailer.

Structs§

ColumnStats
Per-column statistics computed by ANALYZE. See module docs.
Statistics

Enums§

StatisticsError

Constants§

NUM_BUCKETS
v6.2.0 — 100-bucket equi-depth histogram bound count (101 boundary values). v6.2.x can re-tune.

Functions§

build_histogram
Build an equi-depth histogram over a (sorted) sample of textual column values. Returns the 101 boundary strings, or an empty vec when the input has no non-NULL values.
estimate_n_distinct
v6.2.0 — n_distinct estimator. Linear-counting sketch over the already-sorted-and-deduped sample. Returns the exact distinct count on a complete sample; on a reservoir sample, returns the observed count (which v6.2.x can swap for HyperLogLog if needed).