Expand description
PostgreSQL pg_stat_statements adapter — real engine bridge.
This is the only adapter in the crate that targets a production engine
rather than a public benchmark dataset. It is intentionally minimal: it
reads a CSV exported from pg_stat_statements snapshots and emits the
two residual classes that view alone supports — plan-regression and
workload-phase. It does not emit cardinality, contention, or
cache-I/O residuals; those require additional views (EXPLAIN ANALYZE,
pg_stat_activity, pg_stat_io) and are deferred to per-view adapters
we have not yet shipped. The honest deployability matrix in §11 of the
paper records these gaps.
§Expected CSV schema (PostgreSQL 14+)
Export with:
\copy (
SELECT
extract(epoch from now())::float8 AS snapshot_t,
md5(queryid::text) AS query_id,
calls AS calls,
total_exec_time AS total_exec_time_ms
FROM pg_stat_statements
) TO '~/pgss_snapshot.csv' WITH CSV HEADERat a regular interval (e.g. every 60 seconds), appending each snapshot
to the same file. The adapter expects exactly these columns; older
PostgreSQL releases (≤ 13) named the column total_time rather than
total_exec_time — pre-process those exports with s/total_time/total_exec_time/.
query_id is hashed with md5() so the export contains no query text.
§What the adapter computes
For each query_id, snapshots are sorted by snapshot_t and consecutive
pairs produce one mean-time-per-call sample:
Δexec = total_exec_time_ms[t] − total_exec_time_ms[t-1]
Δcalls = calls[t] − calls[t-1]
mean = Δexec / max(Δcalls, 1)A per-query_id baseline is the mean of the first BASELINE_WINDOW
intervals; once the baseline is set, subsequent intervals push a
plan-regression residual via crate::residual::plan_regression::push_latency.
Workload-phase residuals are pushed once per snapshot timestamp: the
Shannon entropy of the per-snapshot call-share distribution across
query_ids, normalised to [0, 1] by dividing by log(n_active_qids).
A drop in entropy (the workload concentrates on fewer query classes) is
the workload-phase signal documented in fact #16 of the concordance.
Functions§
- load_
pg_ stat_ statements - Load a
pg_stat_statementssnapshot CSV and produce a residual stream containing plan-regression + workload-phase samples. Errors if the file is missing, the schema does not match, or fewer than two snapshots are present (the adapter cannot compute a delta from a single snapshot).