Expand description
A DataSource that combines multiple LayoutReaderRefs into a single scannable source.
Readers may be pre-opened or deferred via LayoutReaderFactory. Deferred readers are opened
concurrently during scanning using buffer_unordered: up to concurrency file opens run in
parallel as spawned tasks on the session runtime. Once opened, each reader yields a single
partition covering its full row range; internal I/O pipelining and chunking are handled by
ScanBuilder.
§Schema Resolution
Currently, all children must share the exact same DType. A dtype
mismatch produces an error.
§Future Work
- Schema union: Allow missing columns (filled with nulls) and compatible type upcasts across sources instead of requiring exact dtype matches.
- Hive-style partitioning: Extract partition values from file paths (e.g.
year=2024/month=01/) and expose them as virtual columns. - Virtual columns:
filename,file_row_number,file_index. - Per-file statistics: Merge column statistics across sources for planner hints.
- Error resilience: Skip failed sources instead of aborting the entire scan.
Structs§
- Multi
Layout Data Source - A
DataSourcethat combines multipleLayoutReaderRefs into a single scannable source.
Traits§
- Layout
Reader Factory - An async factory that produces a
LayoutReaderRef.