Expand description
SQLShare text-only adapter.
The 2015 SQLShare SIGMOD data release used to ship a CSV with per-query
runtime and submission-time columns (see QueriesWithPlan.csv /
sdssquerieswithplan.csv in the 2015 reproducibility repository). That
richer release was hosted on the S3 bucket shrquerylogs at
s3-us-west-2.amazonaws.com, which was decommissioned; the bucket itself
no longer exists (verified 2026-04: NoSuchBucket response). The
remaining public artefact is the UW eScience sqlshare_data_release1.zip
bundle, whose top-level queries.txt contains raw SQL query texts
separated by 40-underscore dividers — no user_id, no runtime_seconds,
no submitted_at.
This adapter accepts that remaining artefact honestly: it reads the
queries.txt format, normalises each query into a skeleton (literals and
digits replaced with ?, whitespace collapsed, lower-cased), and emits
only the WorkloadPhase residual class, with Jensen-Shannon divergence
computed over ordinal-position buckets rather than wall-clock buckets.
This is not a temporal analysis. The t axis on the emitted residual
samples is ordinal-bucket-index (multiplied by the bucket size for
plot-axis consistency), and every stream this adapter produces is tagged
sqlshare-text@<file> so downstream reports cannot confuse it with a
wall-clock-indexed SQLShare run. The emitted channel id is
ord[START-END], using the ordinal range covered by the bucket.
The PlanRegression, Cardinality, Contention, and CacheIo classes
are all absent: the public release does not carry the fields required to
construct them, and fabricating those fields would be a category error.
That limitation is documented in §6 of the paper (when colocated) and
cited in the README under the Datasets table.