Skip to main content

Module multi

Module multi 

Source
Expand description

A DataSource that combines multiple LayoutReaderRefs into a single scannable source.

Readers may be pre-opened or deferred via LayoutReaderFactory. Deferred readers are opened concurrently during scanning using buffer_unordered: up to concurrency file opens run in parallel as spawned tasks on the session runtime. Once opened, each reader yields a single partition covering its full row range; internal I/O pipelining and chunking are handled by ScanBuilder.

§Schema Resolution

Currently, all children must share the exact same DType. A dtype mismatch produces an error.

§Future Work

  • Schema union: Allow missing columns (filled with nulls) and compatible type upcasts across sources instead of requiring exact dtype matches.
  • Hive-style partitioning: Extract partition values from file paths (e.g. year=2024/month=01/) and expose them as virtual columns.
  • Virtual columns: filename, file_row_number, file_index.
  • Per-file statistics: Merge column statistics across sources for planner hints.
  • Error resilience: Skip failed sources instead of aborting the entire scan.

Structs§

MultiLayoutDataSource
A DataSource that combines multiple LayoutReaderRefs into a single scannable source.

Traits§

LayoutReaderFactory
An async factory that produces a LayoutReaderRef.