pub struct DatasetConfig {
pub name: String,
pub source: SourceConfig,
pub s3: Option<S3Config>,
pub index: IndexConfig,
pub columns: Vec<String>,
pub dict_encode: bool,
pub lazy: bool,
}Fields§
§name: String§source: SourceConfig§s3: Option<S3Config>§index: IndexConfig§columns: Vec<String>Optional column projection applied at load time. When non-empty, only the listed columns are read from the parquet/delta source — every other column is skipped entirely (no decode, no allocation, no resident memory). Empty (default) = read all columns. Names are matched case-insensitively against the source schema.
dict_encode: boolWhen true (default), Utf8 columns that are dictionary-encoded in
the source parquet are read as Arrow Dictionary(Int32, Utf8)
instead of being expanded to plain Utf8. Massively cheaper in RAM
for low-cardinality columns. Set to false to bypass the override
— useful as a workaround if you observe null-handling oddities on
a particular parquet file.
lazy: boolWhen true, the backend should keep the dataset on disk and stream
it at query time instead of materialising it into RAM at startup.
Trades the in-memory hot paths (raw Arrow slice, equality index)
for bounded memory use on large / multi-file sources. Honoured by
the DataFusion backend (local + S3 parquet) and by the DuckDB
backend, which registers the dataset as a view over the source scan
(local + S3 parquet, and delta) rather than materialising a table.
Implementations§
Source§impl DatasetConfig
impl DatasetConfig
Sourcepub fn resolve_local_parquet_files(&self) -> Result<Vec<PathBuf>, AppError>
pub fn resolve_local_parquet_files(&self) -> Result<Vec<PathBuf>, AppError>
Expand source.location to a concrete list of local .parquet
files. Only valid for kind = parquet on local paths — S3 and
Delta sources are resolved by the backend itself.
Accepts three location shapes:
- a single
*.parquetfile - a directory (lists every
*.parquetdirectly inside, non-recursive) - a glob pattern containing
*,?or[…](e.g.data/year=2024/*.parquet,data/**/*.parquet)
Sourcepub fn env_prefix(&self) -> String
pub fn env_prefix(&self) -> String
Env-var prefix derived from the dataset name: uppercase with
non-alphanumeric chars replaced by _. E.g. sales.eu-1 →
SALES_EU_1.
Sourcepub fn resolved_creds(&self) -> ResolvedCreds
pub fn resolved_creds(&self) -> ResolvedCreds
Resolve S3 credentials following the precedence chain documented at the top of this module. Returns an empty struct when nothing was found — the caller should then leave credential resolution to the engine’s default provider chain.
Sourcepub fn resolved_region(&self) -> String
pub fn resolved_region(&self) -> String
Resolved S3 region: per-dataset env (${PREFIX}_AWS_REGION)
→ inline → AWS_REGION → AWS_DEFAULT_REGION → us-east-1.
Trait Implementations§
Source§impl Clone for DatasetConfig
impl Clone for DatasetConfig
Source§fn clone(&self) -> DatasetConfig
fn clone(&self) -> DatasetConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more