pub struct Dataset {
pub id: String,
pub name: String,
pub version: Option<String>,
pub schema: Vec<(String, String)>,
pub row_count: Option<u64>,
pub content_hash: String,
pub url: Option<String>,
pub license: Option<String>,
pub provenance: Provenance,
pub created: String,
}Expand description
v0.33: Dataset as a first-class kernel object.
A Dataset is a versioned, content-addressed reference to data
that anchors empirical claims. Before v0.33, datasets were strings
in Provenance.title or entity-typed mentions in assertions —
a claim could say “we used ADNI” without anchoring which release
of ADNI the analysis ran against, and re-running the same code on
a refreshed cohort silently produced a “different” claim.
vd_<id> is content-addressed over `name + version + content_hash
- url`. Two dataset records with the same name but different versions get distinct ids; two records pointing at the same snapshot collapse to the same id. Claims can reference the exact bytes they rest on, not only a dataset name in prose.
Fields§
§id: Stringvd_<16hex>, content-addressed; see Dataset::content_address.
name: StringHuman-readable name (e.g. “ADNI”, “TRAILBLAZER-ALZ”, “MIMIC-IV”).
version: Option<String>Semantic version or release tag (e.g. “ADNI-3”, “v2.2”, “SR0”). Two entries differing only in version are distinct kernel objects.
schema: Vec<(String, String)>Optional column-level schema as (name, type) pairs. For
non-tabular datasets, leave empty.
row_count: Option<u64>Number of rows / observations / records, when known.
content_hash: StringSHA-256 of the canonical contents, when computable. For
large datasets stored remotely, this is the publisher’s
declared content hash; integrity verification is the puller’s
job (same pattern as vfr_* snapshots).
url: Option<String>Where the dataset is reachable (https URL, file://, s3://, etc.).
license: Option<String>License identifier or URL (e.g. “CC-BY-4.0”, a Crossref license).
provenance: ProvenanceProvenance of the dataset itself — typically the paper or release
that publishes it. Reuses Provenance for shape parity with
findings.
created: StringRFC 3339 creation timestamp.