Module adapters

Expand description

Dataset adapters.

Each adapter exposes a load(path) function that reads a real subset of the corresponding public dataset from disk and returns a typed ResidualStream. The adapter is responsible for:

format-specific parsing (CSV / Parquet / pickle / SQL)
dropping samples whose required fields are missing or non-finite
sorting by time
embedding the dataset name + version + subset id in stream.source

Where a dataset cannot be redistributed inside the build (Snowset is ~10 GB; SQLShare is permission-gated; the IMDB JOB dump is third-party licensed) the adapter additionally provides a synthetic exemplar function that produces a deterministic, seedable residual stream with the same statistical shape as the real corpus. The paper labels every figure that uses an exemplar with [exemplar] and the corresponding fetch script lets the operator regenerate the figure on the real data.

Design rule (panel-imposed): synthetic exemplars never carry the bare dataset name in stream.source — they always read "{dataset}-exemplar-seed{N}", so a downstream report cannot accidentally label exemplar results as if they were real-data results.

Modules§

ceb: CEB adapter (Cardinality Estimation Benchmark, Negi et al.).
generic_csv: Generic CSV adapter — a single-domain worked example of applying dsfb-database’s motif grammar to a residual stream that was not captured from a SQL engine.
job: JOB adapter (Join Order Benchmark, Leis et al., VLDB 2015).
postgres: PostgreSQL pg_stat_statements adapter — real engine bridge.
snowset: Snowset adapter (Vuppalapati et al., NSDI 2020).
sqlshare: SQLShare adapter (Jain et al., SIGMOD 2016).
sqlshare_text: SQLShare text-only adapter.
tpcds: TPC-DS adapter.

Traits§

DatasetAdapter: Trait for the five dataset adapters.

Module adapters

Module adapters Copy item path

Modules§

Traits§

Module adapters