Module sqlshare_text

Expand description

SQLShare text-only adapter.

The 2015 SQLShare SIGMOD data release used to ship a CSV with per-query runtime and submission-time columns (see QueriesWithPlan.csv / sdssquerieswithplan.csv in the 2015 reproducibility repository). That richer release was hosted on the S3 bucket shrquerylogs at s3-us-west-2.amazonaws.com, which was decommissioned; the bucket itself no longer exists (verified 2026-04: NoSuchBucket response). The remaining public artefact is the UW eScience sqlshare_data_release1.zip bundle, whose top-level queries.txt contains raw SQL query texts separated by 40-underscore dividers — no user_id, no runtime_seconds, no submitted_at.

This adapter accepts that remaining artefact honestly: it reads the queries.txt format, normalises each query into a skeleton (literals and digits replaced with ?, whitespace collapsed, lower-cased), and emits only the WorkloadPhase residual class, with Jensen-Shannon divergence computed over ordinal-position buckets rather than wall-clock buckets.

This is not a temporal analysis. The t axis on the emitted residual samples is ordinal-bucket-index (multiplied by the bucket size for plot-axis consistency), and every stream this adapter produces is tagged sqlshare-text@<file> so downstream reports cannot confuse it with a wall-clock-indexed SQLShare run. The emitted channel id is ord[START-END], using the ordinal range covered by the bucket.

The PlanRegression, Cardinality, Contention, and CacheIo classes are all absent: the public release does not carry the fields required to construct them, and fabricating those fields would be a category error. That limitation is documented in §6 of the paper (when colocated) and cited in the README under the Datasets table.

Structs§

SqlShareText