Module snowset

Expand description

Snowset adapter (Vuppalapati et al., NSDI 2020).

Real subset: CSV distributed at github.com/resource-disaggregation/snowset (mirror: http://www.cs.cornell.edu/~midhul/snowset/snowset-main.csv.gz), verified schema (2026-04) for the columns this adapter touches:

queryId, warehouseId
createdTime — ISO-8601 UTC string with microsecond precision, e.g. 2018-03-02 14:44:02.768000+00:00
execTime — query execution time in microseconds
persistentReadBytesCache — bytes served from the persistent cache (the analogue of the earlier-documented bytesScannedFromCache)
persistentReadBytesS3 — bytes read from S3 (the analogue of bytesScannedFromStorage)

What we extract:

PlanRegression — execTime − rolling_baseline(execTime) per (warehouseId, queryId) pair (proxy for query class — Snowset anonymises SQL text per fact #16).
WorkloadPhase — JS divergence over the per-warehouse query-class histogram in 5-minute buckets.
CacheIo — persistentReadBytesS3 / (persistentReadBytesCache + persistentReadBytesS3) drift (cache-miss-rate residual).

What we cannot extract (paper says so explicitly):

Cardinality — Snowset does not publish est_rows/actual_rows.
Contention — no lock-wait stream.

Structs§

Snowset

Module snowset

Module snowset Copy item path

Structs§

Module snowset