Skip to main content

TaskEstimator

Trait TaskEstimator 

Source
pub trait TaskEstimator {
    // Required methods
    fn task_estimation(
        &self,
        plan: &Arc<dyn ExecutionPlan>,
        cfg: &ConfigOptions,
    ) -> Option<TaskEstimation>;
    fn scale_up_leaf_node(
        &self,
        plan: &Arc<dyn ExecutionPlan>,
        task_count: usize,
        cfg: &ConfigOptions,
    ) -> Result<Option<Arc<dyn ExecutionPlan>>>;

    // Provided method
    fn route_tasks(
        &self,
        _routing_ctx: &TaskRoutingContext<'_>,
    ) -> Result<Option<Vec<Url>>> { ... }
}
Expand description

Given a leaf node, provides an estimation about how many tasks should be used in the stage containing it, and if the leaf node should be replaced by some other.

The distributed planner will try many TaskEstimators in order until one provides an estimation for a specific leaf node. Once that’s done, upper stages will get their task count calculated based on whether lower stages are reducing the cardinality of the data or increasing it.

Required Methods§

Source

fn task_estimation( &self, plan: &Arc<dyn ExecutionPlan>, cfg: &ConfigOptions, ) -> Option<TaskEstimation>

Function applied to each node that returns a TaskEstimation hinting how many tasks should be used in the [Stage] containing that node.

All the TaskEstimator registered in the session will be applied to the node until one returns an estimation.

If no estimation is returned from any of the registered TaskEstimators, then:

  • If the node is a leaf node,Maximum(1) is assumed, hinting the distributed planner that the leaf node cannot be distributed across tasks.
  • If the node is a normal node in the plan, then the maximum task count from its children is inherited.
Source

fn scale_up_leaf_node( &self, plan: &Arc<dyn ExecutionPlan>, task_count: usize, cfg: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

After a final task_count is decided, taking into account all the leaf nodes in the [Stage], this allows performing a transformation in the leaf nodes for accounting for the fact that they are going to run in multiple tasks.

Provided Methods§

Source

fn route_tasks( &self, _routing_ctx: &TaskRoutingContext<'_>, ) -> Result<Option<Vec<Url>>>

Optionally defines a custom protocol for routing tasks to specific worker URLs. Receives routing context including task count and a list of available URLs, and returns a vector of routed URLs, in order of task assignment.

If Ok(Some(Vec)) is returned, tasks are sent in order to the URLs specified in the returned vector. If Ok(None) is returned, execution defaults to round-robin routing.

Dyn Compatibility§

This trait is dyn compatible.

In older versions of Rust, dyn compatibility was called "object safety".

Implementations on Foreign Types§

Source§

impl TaskEstimator for Arc<dyn TaskEstimator + Send + Sync>

Source§

fn task_estimation( &self, plan: &Arc<dyn ExecutionPlan>, cfg: &ConfigOptions, ) -> Option<TaskEstimation>

Source§

fn scale_up_leaf_node( &self, plan: &Arc<dyn ExecutionPlan>, task_count: usize, cfg: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

Source§

fn route_tasks( &self, routing_ctx: &TaskRoutingContext<'_>, ) -> Result<Option<Vec<Url>>>

Source§

impl TaskEstimator for Arc<dyn TaskEstimator>

Source§

fn task_estimation( &self, plan: &Arc<dyn ExecutionPlan>, cfg: &ConfigOptions, ) -> Option<TaskEstimation>

Source§

fn scale_up_leaf_node( &self, plan: &Arc<dyn ExecutionPlan>, task_count: usize, cfg: &ConfigOptions, ) -> Result<Option<Arc<dyn ExecutionPlan>>>

Source§

fn route_tasks( &self, routing_ctx: &TaskRoutingContext<'_>, ) -> Result<Option<Vec<Url>>>

Source§

impl TaskEstimator for usize

Implementors§