#[non_exhaustive]pub struct DistributedConfig {
pub file_scan_config_bytes_per_partition: usize,
pub cardinality_task_count_factor: f64,
pub children_isolator_unions: bool,
pub collect_metrics: bool,
pub broadcast_joins: bool,
pub compression: String,
pub shuffle_batch_size: usize,
pub max_tasks_per_stage: usize,
pub partial_reduce: bool,
pub worker_connection_buffer_budget_bytes: usize,
/* private fields */
}Expand description
Configuration for the distributed planner.
Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.file_scan_config_bytes_per_partition: usizeSets the number of bytes each partitions is expected to scan from parquet files. If more partitions than the ones available in one machine would be needed, several machines are used, and the scan is distributed. Lowering this number will increase parallelism.
cardinality_task_count_factor: f64Task multiplying factor for when a node declares that it changes the cardinality of the data:
- If a node is increasing the cardinality of the data, this factor will increase.
- If a node reduces the cardinality of the data, this factor will decrease.
- In any other situation, this factor is left intact.
children_isolator_unions: boolWhen encountering a UNION operation, isolate its children depending on the task context. For example, on a UNION operation with 3 children running in 3 distributed tasks, instead of executing the 3 children in each 3 tasks with a DistributedTaskContext of 1/3, 2/3, and 3/3 respectively, Execute:
- The first child in the first task with a DistributedTaskContext of 1/1
- The second child in the second task with a DistributedTaskContext of 1/1
- The third child in the third task with a DistributedTaskContext of 1/1
collect_metrics: boolPropagate collected metrics from all nodes in the plan across network boundaries so that they can be reconstructed on the head node of the plan.
broadcast_joins: boolEnable broadcast joins for CollectLeft hash joins. When enabled, the build side of a CollectLeft join is broadcast to all consumer tasks. TODO: This option exists temporarily until we become smarter about when to actually use broadcasting like checking build side size. For now, broadcasting all CollectLeft joins is not always beneficial.
compression: StringThe compression used for sending data over the network between workers.
It can be set to either zstd, lz4 or none.
shuffle_batch_size: usizeOverrides datafusion.execution.batch_size for worker-executed stages. Because
RepartitionExec reads session_config().batch_size() at execute time to size its
output batches (via its internal LimitedBatchCoalescer), this knob lets users tune
shuffle batch sizes independently of the global datafusion.execution.batch_size.
Set to 0 (the default) to apply no override and inherit datafusion.execution.batch_size.
max_tasks_per_stage: usizeMaximum tasks that will be assigned per stage during distributed planning.
If set to 0, this value is the number of workers returned by the provided WorkerResolver.
It defaults to 0.
partial_reduce: boolEnable the PartialReduce optimization, which inserts an extra aggregation pass above hash RepartitionExec before network shuffles to reduce shuffle data size. Disabled by default because its effectiveness is workload-dependent: it helps when aggregation significantly reduces cardinality, but adds overhead when it does not.
worker_connection_buffer_budget_bytes: usizeSoft byte budget that each per-worker connection will buffer in memory before pausing
the gRPC pull from that worker. Per-partition channels are unbounded (to avoid
head-of-line blocking between sibling partitions), so backpressure is enforced
globally per [WorkerConnection] using this budget. A single message larger than this
budget will still be admitted (otherwise we would livelock), so the actual peak per
connection is worker_connection_buffer_budget_bytes + max_message_size.
Implementations§
Source§impl DistributedConfig
impl DistributedConfig
Sourcepub fn with_task_estimator(
self,
task_estimator: impl TaskEstimator + Send + Sync + 'static,
) -> Self
pub fn with_task_estimator( self, task_estimator: impl TaskEstimator + Send + Sync + 'static, ) -> Self
Appends a TaskEstimator to the list. TaskEstimator will be executed sequentially in order on leaf nodes, and the first one to provide a value is the one that gets to decide how many tasks are used for that [Stage].
Sourcepub fn from_config_options(
cfg: &ConfigOptions,
) -> Result<&Self, DataFusionError>
pub fn from_config_options( cfg: &ConfigOptions, ) -> Result<&Self, DataFusionError>
Gets the DistributedConfig from the ConfigOptions’s extensions.
Sourcepub fn from_config_options_mut(
cfg: &mut ConfigOptions,
) -> Result<&mut Self, DataFusionError>
pub fn from_config_options_mut( cfg: &mut ConfigOptions, ) -> Result<&mut Self, DataFusionError>
Gets the DistributedConfig from the ConfigOptions’s extensions.
Trait Implementations§
Source§impl Clone for DistributedConfig
impl Clone for DistributedConfig
Source§fn clone(&self) -> DistributedConfig
fn clone(&self) -> DistributedConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl ConfigExtension for DistributedConfig
impl ConfigExtension for DistributedConfig
Source§impl ConfigField for DistributedConfig
impl ConfigField for DistributedConfig
Source§impl Debug for DistributedConfig
impl Debug for DistributedConfig
Source§impl Default for DistributedConfig
impl Default for DistributedConfig
Source§impl ExtensionOptions for DistributedConfig
impl ExtensionOptions for DistributedConfig
Source§fn cloned(&self) -> Box<dyn ExtensionOptions>
fn cloned(&self) -> Box<dyn ExtensionOptions>
ExtensionOptions Read moreSource§fn entries(&self) -> Vec<ConfigEntry>
fn entries(&self) -> Vec<ConfigEntry>
ConfigEntry stored in this ExtensionOptionsAuto Trait Implementations§
impl !RefUnwindSafe for DistributedConfig
impl !UnwindSafe for DistributedConfig
impl Freeze for DistributedConfig
impl Send for DistributedConfig
impl Sync for DistributedConfig
impl Unpin for DistributedConfig
impl UnsafeUnpin for DistributedConfig
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> IntoRequest<T> for T
impl<T> IntoRequest<T> for T
Source§fn into_request(self) -> Request<T>
fn into_request(self) -> Request<T>
T in a tonic::Request