Skip to main content

StreamSource

Trait StreamSource 

Source
pub trait StreamSource:
    Send
    + Sync
    + Debug {
    // Required methods
    fn schema(&self) -> SchemaRef;
    fn stream(
        &self,
        projection: Option<Vec<usize>>,
        filters: Vec<Expr>,
    ) -> Result<SendableRecordBatchStream, DataFusionError>;

    // Provided methods
    fn supports_filters(&self, filters: &[Expr]) -> Vec<bool> { ... }
    fn output_ordering(&self) -> Option<Vec<SortColumn>> { ... }
}
Expand description

A source of streaming data for DataFusion queries.

This trait enables integration between LaminarDB’s push-based event processing and DataFusion’s pull-based query execution model.

Implementations must be thread-safe as DataFusion may access them from multiple threads during query planning and execution.

§Filter Pushdown

Sources can optionally support filter pushdown by implementing supports_filters. When filters are pushed down, the source should apply them before yielding batches to reduce data transfer.

§Projection Pushdown

Sources should respect the projection parameter in stream() to only read columns that are needed by the query, improving performance.

Required Methods§

Source

fn schema(&self) -> SchemaRef

Returns the schema of records produced by this source.

The schema must be consistent across all calls and must match the schema of RecordBatch instances yielded by stream().

Source

fn stream( &self, projection: Option<Vec<usize>>, filters: Vec<Expr>, ) -> Result<SendableRecordBatchStream, DataFusionError>

Creates a stream of RecordBatch instances.

§Arguments
  • projection - Optional column indices to project. If None, all columns are returned. Indices refer to the source schema.
  • filters - Filter expressions that can be applied at the source. The source may partially or fully apply these filters.
§Returns

A stream that yields RecordBatch instances asynchronously.

§Errors

Returns DataFusionError if the stream cannot be created.

Provided Methods§

Source

fn supports_filters(&self, filters: &[Expr]) -> Vec<bool>

Returns which filters this source can apply.

For each filter in filters, returns true if the source will apply that filter, false otherwise. DataFusion uses this to know which filters still need to be applied after the scan.

The default implementation returns all false, indicating no filter pushdown support.

§Arguments
  • filters - The filters being considered for pushdown.
§Returns

A vector of booleans, one per filter, indicating support.

Source

fn output_ordering(&self) -> Option<Vec<SortColumn>>

Returns the output ordering of this source, if any.

When a source is pre-sorted (e.g., by event time from an ordered Kafka partition), declaring the ordering allows DataFusion to elide SortExec from the physical plan for ORDER BY queries that match the declared ordering.

The default implementation returns None, indicating the source has no guaranteed output ordering.

Implementors§