Skip to main content

WorkUnitFeedProvider

Trait WorkUnitFeedProvider 

Source
pub trait WorkUnitFeedProvider:
    Send
    + Sync
    + Debug {
    type WorkUnit: WorkUnit + Default;

    // Required method
    fn feed(
        &self,
        partition: usize,
        ctx: Arc<TaskContext>,
    ) -> Result<BoxStream<'static, Result<Self::WorkUnit>>>;

    // Provided method
    fn metrics(&self) -> ExecutionPlanMetricsSet { ... }
}
Expand description

Extension point for building user-defined work unit streams consumed by a crate::WorkUnitFeed embedded in a leaf datafusion::physical_plan::ExecutionPlan.

Implement this trait on a type that knows how to produce the per-partition stream of work items (e.g. file addresses, external queries, key ranges) that the leaf plan needs at runtime. Then wrap the implementation with crate::WorkUnitFeed::new and store the resulting crate::WorkUnitFeed as a field of your [ExecutionPlan] node.

In a distributed context the provider is only invoked on the coordinating stage that initiates the query. The work units it produces are serialized and streamed over the network to the workers, which expose the same typed stream to the leaf plan as if it were running locally.

See WorkUnitFeedProvider::feed for the per-call contract.

Required Associated Types§

Required Methods§

Source

fn feed( &self, partition: usize, ctx: Arc<TaskContext>, ) -> Result<BoxStream<'static, Result<Self::WorkUnit>>>

Builds a WorkUnit stream for the given partition.

This method is never invoked in a remote worker. On workers, the equivalent leaf plan uses a remote provider that pulls the work units off the network — user code doesn’t need to implement that case.

When implementing this method, DistributedWorkUnitFeedContext can be extracted from the TaskContext, and it contains information about the amount of distributed tasks to which WorkUnits should be fanned out.

The implementation should be prepared to return P*T feeds, where P is the number of partitions of the datafusion::physical_plan::ExecutionPlan to which the WorkUnitFeedProvider is attached and T is the number of tasks to which it should fanout

For more information about how WorkUnit feeds work, refer to the crate::WorkUnitFeed docs.

§Example

#[derive(Debug)]
struct MyFeedProvider {
    output_partitions: usize
};

#[derive(Clone, PartialEq, ::prost::Message)]
struct MyCustomWorkUnit {
    #[prost(string, tag = "1")]
    custom_field: String,
}

impl WorkUnitFeedProvider for MyFeedProvider {
    type WorkUnit = MyCustomWorkUnit;

    fn feed(
        &self,
        partition: usize,
        ctx: Arc<TaskContext>,
    ) -> Result<BoxStream<'static, Result<Self::WorkUnit>>> {
        let feed_ctx = DistributedWorkUnitFeedContext::from_ctx(&ctx);

        // this method will be called `feed_ctx.fan_out_tasks * self.output_partitions`
        // times.
        Ok(futures::stream::empty().boxed())
    }
}

Provided Methods§

Source

fn metrics(&self) -> ExecutionPlanMetricsSet

DataFusion metrics collected at runtime while streaming WorkUnits through Self::feed.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementors§