pub struct VortexSource { /* private fields */ }Expand description
File scan implementation for reading one or more .vortex files.
VortexSource is the lower-level read component underneath
VortexFormat. It is the type DataFusion stores in a FileScanConfig,
and it is ultimately executed through DataSourceExec.
▲
│
│ Produce a stream of
│ RecordBatches
│
┌───────────────────────┐
│ DataSourceExec │
└───────────────────────┘
▲
│ uses
│
┌───────────────────────┐
│ VortexSource │
└───────────────────────┘
▲
│ opens `.vortex` files via
│
ObjectStore / VortexReadAtMost applications reach VortexSource indirectly through
VortexFormatFactory. Use VortexSource directly when you are
constructing a FileScanConfig yourself or when you need to inject
lower-level behavior such as a custom VortexReaderFactory, an external
VortexAccessPlan, or a specific FileMetadataCache.
§Example
use std::sync::Arc;
use arrow_schema::Schema;
use datafusion_datasource::file_scan_config::FileScanConfigBuilder;
use datafusion_datasource::source::DataSourceExec;
use datafusion_datasource::PartitionedFile;
use datafusion_datasource::TableSchema;
use datafusion_execution::object_store::ObjectStoreUrl;
use vortex::VortexSessionDefault;
use vortex::session::VortexSession;
use vortex_datafusion::VortexSource;
let file_schema = Arc::new(Schema::empty());
let source = Arc::new(
VortexSource::new(
TableSchema::from_file_schema(file_schema),
VortexSession::default(),
)
.with_projection_pushdown(true)
.with_scan_concurrency(4),
);
let config = FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), source)
.with_file(PartitionedFile::new("metrics.vortex", 1024))
.build();
let exec = DataSourceExec::from_data_source(config);§What VortexSource Handles
VortexSource is responsible for:
- translating DataFusion filters into Vortex predicates when possible,
- retaining the full predicate for file pruning based on statistics and partition values,
- configuring per-file readers and sharing parsed layout readers across partitions within the same scan,
- carrying the table schema used for schema evolution and missing-column adaptation,
- attaching a Vortex metrics registry to the read path.
§Projection And Predicate Behavior
VortexSource keeps two related predicate forms:
full_predicate, which is used by DataFusion’sFilePrunerto skip whole files before they are opened,vortex_predicate, which contains only the expressions Vortex can evaluate during the scan.
Projection handling depends on
VortexTableOptions::projection_pushdown:
- when disabled,
VortexSourcestill prunes unreferenced top-level columns, but DataFusion applies the full projection after the scan, - when enabled, the scan can evaluate a Vortex-native projection and leave only unsupported expressions for DataFusion.
§Observability
VortexSource owns a Vortex metrics registry for the lifetime of a physical
scan. The registry is passed to the reader and scan builder so I/O and scan
metrics accumulate as the query executes.
Use VortexMetricsFinder to merge those metrics back into DataFusion
MetricsSet values after the plan has run.
§Execution Flow
At execution time:
- DataFusion calls
DataSourceExec, which delegates file opening toVortexSource. VortexSourcecreates aVortexOpenerconfigured with the current projection, predicate, options, and metrics.- The opener adapts filters and schema for the specific file, applies any
VortexAccessPlan, and builds a Vortex scan. - Scan results are converted into Arrow
RecordBatchvalues for DataFusion.
Implementations§
Source§impl VortexSource
impl VortexSource
Sourcepub fn new(table_schema: TableSchema, session: VortexSession) -> Self
pub fn new(table_schema: TableSchema, session: VortexSession) -> Self
Creates a new VortexSource for a table schema and VortexSession.
The new source starts with:
- all top-level columns projected,
- no pushed filters,
- a default Vortex metrics registry,
- default
VortexTableOptions.
Sourcepub fn with_projection_pushdown(self, enabled: bool) -> Self
pub fn with_projection_pushdown(self, enabled: bool) -> Self
Enables or disables Vortex-native projection evaluation.
This toggles whether VortexSource tries to split DataFusion projection
expressions into a Vortex scan projection plus a leftover DataFusion
projection.
Sourcepub fn with_expression_convertor(
self,
expr_convertor: Arc<dyn ExpressionConvertor>,
) -> Self
pub fn with_expression_convertor( self, expr_convertor: Arc<dyn ExpressionConvertor>, ) -> Self
Sets the ExpressionConvertor used to translate DataFusion expressions
into Vortex expressions.
Override this when the default converter is insufficient for an engine integration or for a custom schema-adaptation strategy.
Sourcepub fn with_vortex_reader_factory(
self,
vortex_reader_factory: Arc<dyn VortexReaderFactory>,
) -> Self
pub fn with_vortex_reader_factory( self, vortex_reader_factory: Arc<dyn VortexReaderFactory>, ) -> Self
Sets a custom factory for the underlying VortexReadAt.
Use this when reads need to go through an application-specific layer
rather than the default DataFusion ObjectStore.
Sourcepub fn metrics_registry(&self) -> &Arc<dyn MetricsRegistry> ⓘ
pub fn metrics_registry(&self) -> &Arc<dyn MetricsRegistry> ⓘ
Returns the MetricsRegistry attached to this scan.
The registry is populated as files are opened and scanned. In most
callers, crate::metrics::VortexMetricsFinder is the more convenient
public API for turning the registry contents into DataFusion metrics.
Sourcepub fn with_file_metadata_cache(
self,
file_metadata_cache: Arc<dyn FileMetadataCache>,
) -> Self
pub fn with_file_metadata_cache( self, file_metadata_cache: Arc<dyn FileMetadataCache>, ) -> Self
Overrides the metadata cache used to reuse Vortex footers across scans.
Sourcepub fn with_scan_concurrency(self, scan_concurrency: usize) -> Self
pub fn with_scan_concurrency(self, scan_concurrency: usize) -> Self
Sets the per-file Vortex scan concurrency.
This is separate from DataFusion’s partition-level parallelism.
Sourcepub fn options(&self) -> &VortexTableOptions
pub fn options(&self) -> &VortexTableOptions
Returns the effective table options for this source.
Sourcepub fn with_options(self, opts: VortexTableOptions) -> Self
pub fn with_options(self, opts: VortexTableOptions) -> Self
Replaces the table options for this source.
Trait Implementations§
Source§impl Clone for VortexSource
impl Clone for VortexSource
Source§fn clone(&self) -> VortexSource
fn clone(&self) -> VortexSource
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl FileSource for VortexSource
impl FileSource for VortexSource
Source§fn create_file_opener(
&self,
object_store: Arc<dyn ObjectStore>,
base_config: &FileScanConfig,
partition: usize,
) -> DFResult<Arc<dyn FileOpener>>
fn create_file_opener( &self, object_store: Arc<dyn ObjectStore>, base_config: &FileScanConfig, partition: usize, ) -> DFResult<Arc<dyn FileOpener>>
dyn FileOpener based on given parametersSource§fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> ⓘ
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> ⓘ
Source§fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
fn filter(&self) -> Option<Arc<dyn PhysicalExpr>>
Source§fn metrics(&self) -> &ExecutionPlanMetricsSet
fn metrics(&self) -> &ExecutionPlanMetricsSet
Source§fn file_type(&self) -> &str
fn file_type(&self) -> &str
Source§fn fmt_extra(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result
fn fmt_extra(&self, t: DisplayFormatType, f: &mut Formatter<'_>) -> Result
Source§fn supports_repartitioning(&self) -> bool
fn supports_repartitioning(&self) -> bool
Source§fn try_pushdown_filters(
&self,
filters: Vec<Arc<dyn PhysicalExpr>>,
_config: &ConfigOptions,
) -> DFResult<FilterPushdownPropagation<Arc<dyn FileSource>>>
fn try_pushdown_filters( &self, filters: Vec<Arc<dyn PhysicalExpr>>, _config: &ConfigOptions, ) -> DFResult<FilterPushdownPropagation<Arc<dyn FileSource>>>
Source§fn try_pushdown_projection(
&self,
projection: &ProjectionExprs,
) -> DFResult<Option<Arc<dyn FileSource>>>
fn try_pushdown_projection( &self, projection: &ProjectionExprs, ) -> DFResult<Option<Arc<dyn FileSource>>>
Source§fn projection(&self) -> Option<&ProjectionExprs>
fn projection(&self) -> Option<&ProjectionExprs>
Self::table_schema. Read moreSource§fn table_schema(&self) -> &TableSchema
fn table_schema(&self) -> &TableSchema
Source§fn repartitioned(
&self,
target_partitions: usize,
repartition_file_min_size: usize,
output_ordering: Option<LexOrdering>,
config: &FileScanConfig,
) -> Result<Option<FileScanConfig>, DataFusionError>
fn repartitioned( &self, target_partitions: usize, repartition_file_min_size: usize, output_ordering: Option<LexOrdering>, config: &FileScanConfig, ) -> Result<Option<FileScanConfig>, DataFusionError>
FileSource, redistribute files across partitions
according to their size. Allows custom file formats to implement their
own repartitioning logic. Read moreSource§fn try_pushdown_sort(
&self,
order: &[PhysicalSortExpr],
eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>, DataFusionError>
fn try_pushdown_sort( &self, order: &[PhysicalSortExpr], eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>, DataFusionError>
Source§fn try_reverse_output(
&self,
_order: &[PhysicalSortExpr],
_eq_properties: &EquivalenceProperties,
) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>, DataFusionError>
fn try_reverse_output( &self, _order: &[PhysicalSortExpr], _eq_properties: &EquivalenceProperties, ) -> Result<SortOrderPushdownResult<Arc<dyn FileSource>>, DataFusionError>
Renamed to try_pushdown_sort. This method was never limited to reversing output. It will be removed in 59.0.0 or later.
Self::try_pushdown_sort.Source§fn with_schema_adapter_factory(
&self,
_factory: Arc<dyn SchemaAdapterFactory>,
) -> Result<Arc<dyn FileSource>, DataFusionError>
fn with_schema_adapter_factory( &self, _factory: Arc<dyn SchemaAdapterFactory>, ) -> Result<Arc<dyn FileSource>, DataFusionError>
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
Source§fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
fn schema_adapter_factory(&self) -> Option<Arc<dyn SchemaAdapterFactory>>
SchemaAdapterFactory has been removed. Use PhysicalExprAdapterFactory instead. See upgrading.md for more details.
Auto Trait Implementations§
impl Freeze for VortexSource
impl !RefUnwindSafe for VortexSource
impl Send for VortexSource
impl Sync for VortexSource
impl Unpin for VortexSource
impl UnsafeUnpin for VortexSource
impl !UnwindSafe for VortexSource
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> FmtForward for T
impl<T> FmtForward for T
Source§fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
fn fmt_binary(self) -> FmtBinary<Self>where
Self: Binary,
self to use its Binary implementation when Debug-formatted.Source§fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
fn fmt_display(self) -> FmtDisplay<Self>where
Self: Display,
self to use its Display implementation when
Debug-formatted.Source§fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
fn fmt_lower_exp(self) -> FmtLowerExp<Self>where
Self: LowerExp,
self to use its LowerExp implementation when
Debug-formatted.Source§fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
fn fmt_lower_hex(self) -> FmtLowerHex<Self>where
Self: LowerHex,
self to use its LowerHex implementation when
Debug-formatted.Source§fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
fn fmt_octal(self) -> FmtOctal<Self>where
Self: Octal,
self to use its Octal implementation when Debug-formatted.Source§fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
fn fmt_pointer(self) -> FmtPointer<Self>where
Self: Pointer,
self to use its Pointer implementation when
Debug-formatted.Source§fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
fn fmt_upper_exp(self) -> FmtUpperExp<Self>where
Self: UpperExp,
self to use its UpperExp implementation when
Debug-formatted.Source§fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
fn fmt_upper_hex(self) -> FmtUpperHex<Self>where
Self: UpperHex,
self to use its UpperHex implementation when
Debug-formatted.Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pipe for Twhere
T: ?Sized,
impl<T> Pipe for Twhere
T: ?Sized,
Source§fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
fn pipe<R>(self, func: impl FnOnce(Self) -> R) -> Rwhere
Self: Sized,
Source§fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref<'a, R>(&'a self, func: impl FnOnce(&'a Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
fn pipe_ref_mut<'a, R>(&'a mut self, func: impl FnOnce(&'a mut Self) -> R) -> Rwhere
R: 'a,
self and passes that borrow into the pipe function. Read moreSource§fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
fn pipe_borrow<'a, B, R>(&'a self, func: impl FnOnce(&'a B) -> R) -> R
Source§fn pipe_borrow_mut<'a, B, R>(
&'a mut self,
func: impl FnOnce(&'a mut B) -> R,
) -> R
fn pipe_borrow_mut<'a, B, R>( &'a mut self, func: impl FnOnce(&'a mut B) -> R, ) -> R
Source§fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
fn pipe_as_ref<'a, U, R>(&'a self, func: impl FnOnce(&'a U) -> R) -> R
self, then passes self.as_ref() into the pipe function.Source§fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
fn pipe_as_mut<'a, U, R>(&'a mut self, func: impl FnOnce(&'a mut U) -> R) -> R
self, then passes self.as_mut() into the pipe
function.Source§fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
fn pipe_deref<'a, T, R>(&'a self, func: impl FnOnce(&'a T) -> R) -> R
self, then passes self.deref() into the pipe function.Source§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> Tap for T
impl<T> Tap for T
Source§fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow<B>(self, func: impl FnOnce(&B)) -> Self
Borrow<B> of a value. Read moreSource§fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut<B>(self, func: impl FnOnce(&mut B)) -> Self
BorrowMut<B> of a value. Read moreSource§fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref<R>(self, func: impl FnOnce(&R)) -> Self
AsRef<R> view of a value. Read moreSource§fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut<R>(self, func: impl FnOnce(&mut R)) -> Self
AsMut<R> view of a value. Read moreSource§fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref<T>(self, func: impl FnOnce(&T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
fn tap_deref_mut<T>(self, func: impl FnOnce(&mut T)) -> Self
Deref::Target of a value. Read moreSource§fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
fn tap_dbg(self, func: impl FnOnce(&Self)) -> Self
.tap() only in debug builds, and is erased in release builds.Source§fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
fn tap_mut_dbg(self, func: impl FnOnce(&mut Self)) -> Self
.tap_mut() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
fn tap_borrow_dbg<B>(self, func: impl FnOnce(&B)) -> Self
.tap_borrow() only in debug builds, and is erased in release
builds.Source§fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
fn tap_borrow_mut_dbg<B>(self, func: impl FnOnce(&mut B)) -> Self
.tap_borrow_mut() only in debug builds, and is erased in release
builds.Source§fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
fn tap_ref_dbg<R>(self, func: impl FnOnce(&R)) -> Self
.tap_ref() only in debug builds, and is erased in release
builds.Source§fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
fn tap_ref_mut_dbg<R>(self, func: impl FnOnce(&mut R)) -> Self
.tap_ref_mut() only in debug builds, and is erased in release
builds.Source§fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
fn tap_deref_dbg<T>(self, func: impl FnOnce(&T)) -> Self
.tap_deref() only in debug builds, and is erased in release
builds.