Skip to main content

DistributedConfig

Struct DistributedConfig 

Source
#[non_exhaustive]
pub struct DistributedConfig { pub file_scan_config_bytes_per_partition: usize, pub cardinality_task_count_factor: f64, pub children_isolator_unions: bool, pub collect_metrics: bool, pub broadcast_joins: bool, pub compression: String, pub shuffle_batch_size: usize, pub max_tasks_per_stage: usize, pub partial_reduce: bool, pub worker_connection_buffer_budget_bytes: usize, /* private fields */ }
Expand description

Configuration for the distributed planner.

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.
§file_scan_config_bytes_per_partition: usize

Sets the number of bytes each partitions is expected to scan from parquet files. If more partitions than the ones available in one machine would be needed, several machines are used, and the scan is distributed. Lowering this number will increase parallelism.

§cardinality_task_count_factor: f64

Task multiplying factor for when a node declares that it changes the cardinality of the data:

  • If a node is increasing the cardinality of the data, this factor will increase.
  • If a node reduces the cardinality of the data, this factor will decrease.
  • In any other situation, this factor is left intact.
§children_isolator_unions: bool

When encountering a UNION operation, isolate its children depending on the task context. For example, on a UNION operation with 3 children running in 3 distributed tasks, instead of executing the 3 children in each 3 tasks with a DistributedTaskContext of 1/3, 2/3, and 3/3 respectively, Execute:

  • The first child in the first task with a DistributedTaskContext of 1/1
  • The second child in the second task with a DistributedTaskContext of 1/1
  • The third child in the third task with a DistributedTaskContext of 1/1
§collect_metrics: bool

Propagate collected metrics from all nodes in the plan across network boundaries so that they can be reconstructed on the head node of the plan.

§broadcast_joins: bool

Enable broadcast joins for CollectLeft hash joins. When enabled, the build side of a CollectLeft join is broadcast to all consumer tasks. TODO: This option exists temporarily until we become smarter about when to actually use broadcasting like checking build side size. For now, broadcasting all CollectLeft joins is not always beneficial.

§compression: String

The compression used for sending data over the network between workers. It can be set to either zstd, lz4 or none.

§shuffle_batch_size: usize

Overrides datafusion.execution.batch_size for worker-executed stages. Because RepartitionExec reads session_config().batch_size() at execute time to size its output batches (via its internal LimitedBatchCoalescer), this knob lets users tune shuffle batch sizes independently of the global datafusion.execution.batch_size.

Set to 0 (the default) to apply no override and inherit datafusion.execution.batch_size.

§max_tasks_per_stage: usize

Maximum tasks that will be assigned per stage during distributed planning. If set to 0, this value is the number of workers returned by the provided WorkerResolver. It defaults to 0.

§partial_reduce: bool

Enable the PartialReduce optimization, which inserts an extra aggregation pass above hash RepartitionExec before network shuffles to reduce shuffle data size. Disabled by default because its effectiveness is workload-dependent: it helps when aggregation significantly reduces cardinality, but adds overhead when it does not.

§worker_connection_buffer_budget_bytes: usize

Soft byte budget that each per-worker connection will buffer in memory before pausing the gRPC pull from that worker. Per-partition channels are unbounded (to avoid head-of-line blocking between sibling partitions), so backpressure is enforced globally per [WorkerConnection] using this budget. A single message larger than this budget will still be admitted (otherwise we would livelock), so the actual peak per connection is worker_connection_buffer_budget_bytes + max_message_size.

Implementations§

Source§

impl DistributedConfig

Source

pub fn with_task_estimator( self, task_estimator: impl TaskEstimator + Send + Sync + 'static, ) -> Self

Appends a TaskEstimator to the list. TaskEstimator will be executed sequentially in order on leaf nodes, and the first one to provide a value is the one that gets to decide how many tasks are used for that [Stage].

Source

pub fn from_config_options( cfg: &ConfigOptions, ) -> Result<&Self, DataFusionError>

Gets the DistributedConfig from the ConfigOptions’s extensions.

Source

pub fn from_config_options_mut( cfg: &mut ConfigOptions, ) -> Result<&mut Self, DataFusionError>

Gets the DistributedConfig from the ConfigOptions’s extensions.

Trait Implementations§

Source§

impl Clone for DistributedConfig

Source§

fn clone(&self) -> DistributedConfig

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl ConfigExtension for DistributedConfig

Source§

const PREFIX: &'static str = "distributed"

Configuration namespace prefix to use Read more
Source§

impl ConfigField for DistributedConfig

Source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

Source§

fn visit<V: Visit>( &self, v: &mut V, _key_prefix: &str, _description: &'static str, )

Source§

fn reset(&mut self, key: &str) -> Result<(), DataFusionError>

Source§

impl Debug for DistributedConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for DistributedConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl ExtensionOptions for DistributedConfig

Source§

fn as_any(&self) -> &dyn Any

Return self as Any Read more
Source§

fn as_any_mut(&mut self) -> &mut dyn Any

Return self as Any Read more
Source§

fn cloned(&self) -> Box<dyn ExtensionOptions>

Return a deep clone of this ExtensionOptions Read more
Source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

Set the given key, value pair
Source§

fn entries(&self) -> Vec<ConfigEntry>

Returns the ConfigEntry stored in this ExtensionOptions

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<L> LayerExt<L> for L

Source§

fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>
where L: Layer<S>,

Applies the layer to a service and wraps it in Layered.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more