Expand description
Snowset adapter (Vuppalapati et al., NSDI 2020).
Real subset: CSV distributed at
github.com/resource-disaggregation/snowset
(mirror: http://www.cs.cornell.edu/~midhul/snowset/snowset-main.csv.gz),
verified schema (2026-04) for the columns this adapter touches:
queryId,warehouseIdcreatedTime— ISO-8601 UTC string with microsecond precision, e.g.2018-03-02 14:44:02.768000+00:00execTime— query execution time in microsecondspersistentReadBytesCache— bytes served from the persistent cache (the analogue of the earlier-documentedbytesScannedFromCache)persistentReadBytesS3— bytes read from S3 (the analogue ofbytesScannedFromStorage)
What we extract:
PlanRegression—execTime − rolling_baseline(execTime)per(warehouseId, queryId)pair (proxy for query class — Snowset anonymises SQL text per fact #16).WorkloadPhase— JS divergence over the per-warehouse query-class histogram in 5-minute buckets.CacheIo—persistentReadBytesS3 / (persistentReadBytesCache + persistentReadBytesS3)drift (cache-miss-rate residual).
What we cannot extract (paper says so explicitly):
Cardinality— Snowset does not publishest_rows/actual_rows.Contention— no lock-wait stream.