Skip to main content

Module adapters

Module adapters 

Source
Expand description

Dataset adapters.

Each adapter exposes a load(path) function that reads a real subset of the corresponding public dataset from disk and returns a typed ResidualStream. The adapter is responsible for:

  • format-specific parsing (CSV / Parquet / pickle / SQL)
  • dropping samples whose required fields are missing or non-finite
  • sorting by time
  • embedding the dataset name + version + subset id in stream.source

Where a dataset cannot be redistributed inside the build (Snowset is ~10 GB; SQLShare is permission-gated; the IMDB JOB dump is third-party licensed) the adapter additionally provides a synthetic exemplar function that produces a deterministic, seedable residual stream with the same statistical shape as the real corpus. The paper labels every figure that uses an exemplar with [exemplar] and the corresponding fetch script lets the operator regenerate the figure on the real data.

Design rule (panel-imposed): synthetic exemplars never carry the bare dataset name in stream.source — they always read "{dataset}-exemplar-seed{N}", so a downstream report cannot accidentally label exemplar results as if they were real-data results.

Modules§

ceb
CEB adapter (Cardinality Estimation Benchmark, Negi et al.).
generic_csv
Generic CSV adapter — a single-domain worked example of applying dsfb-database’s motif grammar to a residual stream that was not captured from a SQL engine.
job
JOB adapter (Join Order Benchmark, Leis et al., VLDB 2015).
postgres
PostgreSQL pg_stat_statements adapter — real engine bridge.
snowset
Snowset adapter (Vuppalapati et al., NSDI 2020).
sqlshare
SQLShare adapter (Jain et al., SIGMOD 2016).
sqlshare_text
SQLShare text-only adapter.
tpcds
TPC-DS adapter.

Traits§

DatasetAdapter
Trait for the five dataset adapters.