Skip to main content

DatasetConfig

Struct DatasetConfig 

Source
pub struct DatasetConfig {
    pub name: String,
    pub source: SourceConfig,
    pub s3: Option<S3Config>,
    pub index: IndexConfig,
    pub columns: Vec<String>,
    pub dict_encode: bool,
    pub lazy: bool,
}

Fields§

§name: String§source: SourceConfig§s3: Option<S3Config>§index: IndexConfig§columns: Vec<String>

Optional column projection applied at load time. When non-empty, only the listed columns are read from the parquet/delta source — every other column is skipped entirely (no decode, no allocation, no resident memory). Empty (default) = read all columns. Names are matched case-insensitively against the source schema.

§dict_encode: bool

When true (default), Utf8 columns that are dictionary-encoded in the source parquet are read as Arrow Dictionary(Int32, Utf8) instead of being expanded to plain Utf8. Massively cheaper in RAM for low-cardinality columns. Set to false to bypass the override — useful as a workaround if you observe null-handling oddities on a particular parquet file.

§lazy: bool

When true, the backend should keep the dataset on disk and stream it at query time instead of materialising it into RAM at startup. Trades the in-memory hot paths (raw Arrow slice, equality index) for bounded memory use on large / multi-file sources. Honoured by the DataFusion backend (local + S3 parquet) and by the DuckDB backend, which registers the dataset as a view over the source scan (local + S3 parquet, and delta) rather than materialising a table.

Implementations§

Source§

impl DatasetConfig

Source

pub fn resolve_local_parquet_files(&self) -> Result<Vec<PathBuf>, AppError>

Expand source.location to a concrete list of local .parquet files. Only valid for kind = parquet on local paths — S3 and Delta sources are resolved by the backend itself.

Accepts three location shapes:

  • a single *.parquet file
  • a directory (lists every *.parquet directly inside, non-recursive)
  • a glob pattern containing *, ? or […] (e.g. data/year=2024/*.parquet, data/**/*.parquet)
Source

pub fn estimate_local_bytes(&self) -> Option<u64>

Estimate the on-disk byte size of this dataset’s local backing files. Returns None for S3 sources (sizing would require a network round-trip) or when nothing can be measured.

  • parquet sums the resolved .parquet files (single file, directory, or glob).
  • delta sums every *.parquet data file under the table root. This slightly over-counts when stale files haven’t been vacuumed, which is fine for a coarse force-lazy threshold.
Source

pub fn force_lazy_bytes(&self, server: &ServerConfig) -> Option<u64>

Decide whether this dataset should be forced into lazy mode given the server’s force_lazy_above_mb threshold. Returns Some(bytes) (the measured size) when it should be forced, so the caller can log it. Returns None when the dataset is already lazy, the threshold is disabled, the source is S3, or the measured size is unknown or at or below the threshold.

Source

pub fn env_prefix(&self) -> String

Env-var prefix derived from the dataset name: uppercase with non-alphanumeric chars replaced by _. E.g. sales.eu-1SALES_EU_1.

Source

pub fn resolved_creds(&self) -> ResolvedCreds

Resolve S3 credentials following the precedence chain documented at the top of this module. Returns an empty struct when nothing was found — the caller should then leave credential resolution to the engine’s default provider chain.

Source

pub fn resolved_region(&self) -> String

Resolved S3 region: per-dataset env (${PREFIX}_AWS_REGION) → inline → AWS_REGIONAWS_DEFAULT_REGIONus-east-1.

Trait Implementations§

Source§

impl Clone for DatasetConfig

Source§

fn clone(&self) -> DatasetConfig

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for DatasetConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'de> Deserialize<'de> for DatasetConfig

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more