Expand description
v6.2.0 — per-column statistics for the cost-based optimizer.
Each analysed (table, column) pair gets a ColumnStats
row: null_frac ∈ [0.0, 1.0], n_distinct count, and a
100-bucket equi-depth histogram (Vec<String> of 101 bounds —
v0 .. v100). Skewed distributions live in the bucket widths,
not in a separate MCV sidecar (see V6_2_DESIGN.md deliberation
#1).
Storage shape mirrors crate::publications::Publications and
crate::subscriptions::Subscriptions:
BTreeMap<(String, String), ColumnStats>keeps iteration in deterministic alphabetical order (snapshot byte-stable regardless of insertion sequence).BTreeMap<String, u64>tracks per-table modified-row count for v6.2.1 auto-analyze’s 10 % threshold trigger.
Persistence rides the snapshot envelope’s v5 trailer block (see
crate::lib::build_envelope). v1/v2/v3/v4 envelopes deserialise
to empty statistics; v5 writers always emit the trailer.
Structs§
- Column
Stats - Per-column statistics computed by ANALYZE. See module docs.
- Statistics
Enums§
Constants§
- NUM_
BUCKETS - v6.2.0 — 100-bucket equi-depth histogram bound count (101 boundary values). v6.2.x can re-tune.
Functions§
- build_
histogram - Build an equi-depth histogram over a (sorted) sample of textual column values. Returns the 101 boundary strings, or an empty vec when the input has no non-NULL values.
- estimate_
n_ distinct - v6.2.0 — n_distinct estimator. Linear-counting sketch over the already-sorted-and-deduped sample. Returns the exact distinct count on a complete sample; on a reservoir sample, returns the observed count (which v6.2.x can swap for HyperLogLog if needed).