1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
//! Dataset adapters.
//!
//! Each adapter exposes a `load(path)` function that reads a real subset of
//! the corresponding public dataset from disk and returns a typed
//! [`ResidualStream`]. The adapter is responsible for:
//! * format-specific parsing (CSV / Parquet / pickle / SQL)
//! * dropping samples whose required fields are missing or non-finite
//! * sorting by time
//! * embedding the dataset name + version + subset id in `stream.source`
//!
//! Where a dataset cannot be redistributed inside the build (Snowset is
//! ~10 GB; SQLShare is permission-gated; the IMDB JOB dump is third-party
//! licensed) the adapter additionally provides a *synthetic exemplar*
//! function that produces a deterministic, seedable residual stream with the
//! same statistical shape as the real corpus. The paper labels every figure
//! that uses an exemplar with `[exemplar]` and the corresponding fetch
//! script lets the operator regenerate the figure on the real data.
//!
//! Design rule (panel-imposed): synthetic exemplars never carry the bare
//! dataset name in `stream.source` — they always read
//! `"{dataset}-exemplar-seed{N}"`, so a downstream report cannot
//! accidentally label exemplar results as if they were real-data results.
use crateResidualStream;
use Result;
/// Trait for the five dataset adapters.