#[non_exhaustive]pub struct DistributedConfig {
pub files_per_task: usize,
pub cardinality_task_count_factor: f64,
pub shuffle_batch_size: usize,
pub children_isolator_unions: bool,
pub collect_metrics: bool,
pub broadcast_joins: bool,
pub compression: String,
pub max_tasks_per_stage: usize,
/* private fields */
}Expand description
Configuration for the distributed planner.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.files_per_task: usizeSets the maximum amount of files that will be assigned to each task. Reducing this
number will spawn more tasks for the same number of files. This only applies when
estimating tasks for stages containing DataSourceExec nodes with FileScanConfig
implementations.
cardinality_task_count_factor: f64Task multiplying factor for when a node declares that it changes the cardinality of the data:
- If a node is increasing the cardinality of the data, this factor will increase.
- If a node reduces the cardinality of the data, this factor will decrease.
- In any other situation, this factor is left intact.
shuffle_batch_size: usizeUpon shuffling over the network, data streams need to be disassembled in a lot of output partitions, which means the resulting streams might contain a lot of tiny record batches to be sent over the wire. This parameter controls the batch size in number of rows for the CoalesceBatchExec operator that is placed at the top of the stage for sending bigger batches over the wire. If set to 0, batch coalescing is disabled on network shuffle operations.
children_isolator_unions: boolWhen encountering a UNION operation, isolate its children depending on the task context. For example, on a UNION operation with 3 children running in 3 distributed tasks, instead of executing the 3 children in each 3 tasks with a DistributedTaskContext of 1/3, 2/3, and 3/3 respectively, Execute:
- The first child in the first task with a DistributedTaskContext of 1/1
- The second child in the second task with a DistributedTaskContext of 1/1
- The third child in the third task with a DistributedTaskContext of 1/1
collect_metrics: boolPropagate collected metrics from all nodes in the plan across network boundaries so that they can be reconstructed on the head node of the plan.
broadcast_joins: boolEnable broadcast joins for CollectLeft hash joins. When enabled, the build side of a CollectLeft join is broadcast to all consumer tasks. TODO: This option exists temporarily until we become smarter about when to actually use broadcasting like checking build side size. For now, broadcasting all CollectLeft joins is not always beneficial.
compression: StringThe compression used for sending data over the network between workers.
It can be set to either zstd, lz4 or none.
max_tasks_per_stage: usizeMaximum tasks that will be assigned per stage during distributed planning.
If set to 0, this value is the number of workers returned by the provided WorkerResolver.
It defaults to 0.
Implementations§
Source§impl DistributedConfig
impl DistributedConfig
Sourcepub fn with_task_estimator(
self,
task_estimator: impl TaskEstimator + Send + Sync + 'static,
) -> Self
pub fn with_task_estimator( self, task_estimator: impl TaskEstimator + Send + Sync + 'static, ) -> Self
Appends a TaskEstimator to the list. TaskEstimator will be executed sequentially in order on leaf nodes, and the first one to provide a value is the one that gets to decide how many tasks are used for that [Stage].
Sourcepub fn from_config_options(
cfg: &ConfigOptions,
) -> Result<&Self, DataFusionError>
pub fn from_config_options( cfg: &ConfigOptions, ) -> Result<&Self, DataFusionError>
Gets the DistributedConfig from the ConfigOptions’s extensions.
Sourcepub fn from_config_options_mut(
cfg: &mut ConfigOptions,
) -> Result<&mut Self, DataFusionError>
pub fn from_config_options_mut( cfg: &mut ConfigOptions, ) -> Result<&mut Self, DataFusionError>
Gets the DistributedConfig from the ConfigOptions’s extensions.
Trait Implementations§
Source§impl Clone for DistributedConfig
impl Clone for DistributedConfig
Source§fn clone(&self) -> DistributedConfig
fn clone(&self) -> DistributedConfig
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl ConfigExtension for DistributedConfig
impl ConfigExtension for DistributedConfig
Source§impl ConfigField for DistributedConfig
impl ConfigField for DistributedConfig
Source§impl Debug for DistributedConfig
impl Debug for DistributedConfig
Source§impl Default for DistributedConfig
impl Default for DistributedConfig
Source§impl ExtensionOptions for DistributedConfig
impl ExtensionOptions for DistributedConfig
Source§fn cloned(&self) -> Box<dyn ExtensionOptions>
fn cloned(&self) -> Box<dyn ExtensionOptions>
ExtensionOptions Read moreSource§fn entries(&self) -> Vec<ConfigEntry>
fn entries(&self) -> Vec<ConfigEntry>
ConfigEntry stored in this ExtensionOptionsAuto Trait Implementations§
impl Freeze for DistributedConfig
impl !RefUnwindSafe for DistributedConfig
impl Send for DistributedConfig
impl Sync for DistributedConfig
impl Unpin for DistributedConfig
impl UnsafeUnpin for DistributedConfig
impl !UnwindSafe for DistributedConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request