Skip to main content

Module postgres

Module postgres 

Source
Expand description

PostgreSQL pg_stat_statements adapter — real engine bridge.

This is the only adapter in the crate that targets a production engine rather than a public benchmark dataset. It is intentionally minimal: it reads a CSV exported from pg_stat_statements snapshots and emits the two residual classes that view alone supports — plan-regression and workload-phase. It does not emit cardinality, contention, or cache-I/O residuals; those require additional views (EXPLAIN ANALYZE, pg_stat_activity, pg_stat_io) and are deferred to per-view adapters we have not yet shipped. The honest deployability matrix in §11 of the paper records these gaps.

§Expected CSV schema (PostgreSQL 14+)

Export with:

\copy (
  SELECT
    extract(epoch from now())::float8 AS snapshot_t,
    md5(queryid::text)                AS query_id,
    calls                             AS calls,
    total_exec_time                   AS total_exec_time_ms
  FROM pg_stat_statements
) TO '~/pgss_snapshot.csv' WITH CSV HEADER

at a regular interval (e.g. every 60 seconds), appending each snapshot to the same file. The adapter expects exactly these columns; older PostgreSQL releases (≤ 13) named the column total_time rather than total_exec_time — pre-process those exports with s/total_time/total_exec_time/. query_id is hashed with md5() so the export contains no query text.

§What the adapter computes

For each query_id, snapshots are sorted by snapshot_t and consecutive pairs produce one mean-time-per-call sample:

Δexec  = total_exec_time_ms[t] − total_exec_time_ms[t-1]
Δcalls = calls[t] − calls[t-1]
mean   = Δexec / max(Δcalls, 1)

A per-query_id baseline is the mean of the first BASELINE_WINDOW intervals; once the baseline is set, subsequent intervals push a plan-regression residual via crate::residual::plan_regression::push_latency.

Workload-phase residuals are pushed once per snapshot timestamp: the Shannon entropy of the per-snapshot call-share distribution across query_ids, normalised to [0, 1] by dividing by log(n_active_qids). A drop in entropy (the workload concentrates on fewer query classes) is the workload-phase signal documented in fact #16 of the concordance.

Functions§

load_pg_stat_statements
Load a pg_stat_statements snapshot CSV and produce a residual stream containing plan-regression + workload-phase samples. Errors if the file is missing, the schema does not match, or fewer than two snapshots are present (the adapter cannot compute a delta from a single snapshot).