pub trait WorkUnitFeedProvider:
Send
+ Sync
+ Debug {
type WorkUnit: WorkUnit + Default;
// Required method
fn feed(
&self,
partition: usize,
ctx: Arc<TaskContext>,
) -> Result<BoxStream<'static, Result<Self::WorkUnit>>>;
// Provided method
fn metrics(&self) -> ExecutionPlanMetricsSet { ... }
}Expand description
Extension point for building user-defined work unit streams consumed by a
crate::WorkUnitFeed embedded in a leaf datafusion::physical_plan::ExecutionPlan.
Implement this trait on a type that knows how to produce the per-partition stream of
work items (e.g. file addresses, external queries, key ranges) that the leaf plan needs
at runtime. Then wrap the implementation with crate::WorkUnitFeed::new and store
the resulting crate::WorkUnitFeed as a field of your [ExecutionPlan] node.
In a distributed context the provider is only invoked on the coordinating stage that initiates the query. The work units it produces are serialized and streamed over the network to the workers, which expose the same typed stream to the leaf plan as if it were running locally.
See WorkUnitFeedProvider::feed for the per-call contract.
Required Associated Types§
Required Methods§
Sourcefn feed(
&self,
partition: usize,
ctx: Arc<TaskContext>,
) -> Result<BoxStream<'static, Result<Self::WorkUnit>>>
fn feed( &self, partition: usize, ctx: Arc<TaskContext>, ) -> Result<BoxStream<'static, Result<Self::WorkUnit>>>
Builds a WorkUnit stream for the given partition.
This method is never invoked in a remote worker. On workers, the equivalent leaf plan uses a remote provider that pulls the work units off the network — user code doesn’t need to implement that case.
When implementing this method, DistributedWorkUnitFeedContext can be extracted from the TaskContext, and it contains information about the amount of distributed tasks to which WorkUnits should be fanned out.
The implementation should be prepared to return P*T feeds, where P is the number of
partitions of the datafusion::physical_plan::ExecutionPlan to which the
WorkUnitFeedProvider is attached and T is the number of tasks to which it should fanout
For more information about how WorkUnit feeds work, refer to the crate::WorkUnitFeed docs.
§Example
#[derive(Debug)]
struct MyFeedProvider {
output_partitions: usize
};
#[derive(Clone, PartialEq, ::prost::Message)]
struct MyCustomWorkUnit {
#[prost(string, tag = "1")]
custom_field: String,
}
impl WorkUnitFeedProvider for MyFeedProvider {
type WorkUnit = MyCustomWorkUnit;
fn feed(
&self,
partition: usize,
ctx: Arc<TaskContext>,
) -> Result<BoxStream<'static, Result<Self::WorkUnit>>> {
let feed_ctx = DistributedWorkUnitFeedContext::from_ctx(&ctx);
// this method will be called `feed_ctx.fan_out_tasks * self.output_partitions`
// times.
Ok(futures::stream::empty().boxed())
}
}Provided Methods§
Sourcefn metrics(&self) -> ExecutionPlanMetricsSet
fn metrics(&self) -> ExecutionPlanMetricsSet
DataFusion metrics collected at runtime while streaming WorkUnits through Self::feed.
Dyn Compatibility§
This trait is dyn compatible.
In older versions of Rust, dyn compatibility was called "object safety".