Skip to main content

Module snowset

Module snowset 

Source
Expand description

Snowset adapter (Vuppalapati et al., NSDI 2020).

Real subset: CSV distributed at github.com/resource-disaggregation/snowset (mirror: http://www.cs.cornell.edu/~midhul/snowset/snowset-main.csv.gz), verified schema (2026-04) for the columns this adapter touches:

  • queryId, warehouseId
  • createdTime — ISO-8601 UTC string with microsecond precision, e.g. 2018-03-02 14:44:02.768000+00:00
  • execTime — query execution time in microseconds
  • persistentReadBytesCache — bytes served from the persistent cache (the analogue of the earlier-documented bytesScannedFromCache)
  • persistentReadBytesS3 — bytes read from S3 (the analogue of bytesScannedFromStorage)

What we extract:

  • PlanRegressionexecTime − rolling_baseline(execTime) per (warehouseId, queryId) pair (proxy for query class — Snowset anonymises SQL text per fact #16).
  • WorkloadPhase — JS divergence over the per-warehouse query-class histogram in 5-minute buckets.
  • CacheIopersistentReadBytesS3 / (persistentReadBytesCache + persistentReadBytesS3) drift (cache-miss-rate residual).

What we cannot extract (paper says so explicitly):

  • Cardinality — Snowset does not publish est_rows/actual_rows.
  • Contention — no lock-wait stream.

Structs§

Snowset