Skip to main content

DistributedConfig

Struct DistributedConfig 

Source
#[non_exhaustive]
pub struct DistributedConfig { pub files_per_task: usize, pub cardinality_task_count_factor: f64, pub shuffle_batch_size: usize, pub children_isolator_unions: bool, pub collect_metrics: bool, pub broadcast_joins: bool, pub compression: String, pub max_tasks_per_stage: usize, /* private fields */ }
Expand description

Configuration for the distributed planner.

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.
§files_per_task: usize

Sets the maximum amount of files that will be assigned to each task. Reducing this number will spawn more tasks for the same number of files. This only applies when estimating tasks for stages containing DataSourceExec nodes with FileScanConfig implementations.

§cardinality_task_count_factor: f64

Task multiplying factor for when a node declares that it changes the cardinality of the data:

  • If a node is increasing the cardinality of the data, this factor will increase.
  • If a node reduces the cardinality of the data, this factor will decrease.
  • In any other situation, this factor is left intact.
§shuffle_batch_size: usize

Upon shuffling over the network, data streams need to be disassembled in a lot of output partitions, which means the resulting streams might contain a lot of tiny record batches to be sent over the wire. This parameter controls the batch size in number of rows for the CoalesceBatchExec operator that is placed at the top of the stage for sending bigger batches over the wire. If set to 0, batch coalescing is disabled on network shuffle operations.

§children_isolator_unions: bool

When encountering a UNION operation, isolate its children depending on the task context. For example, on a UNION operation with 3 children running in 3 distributed tasks, instead of executing the 3 children in each 3 tasks with a DistributedTaskContext of 1/3, 2/3, and 3/3 respectively, Execute:

  • The first child in the first task with a DistributedTaskContext of 1/1
  • The second child in the second task with a DistributedTaskContext of 1/1
  • The third child in the third task with a DistributedTaskContext of 1/1
§collect_metrics: bool

Propagate collected metrics from all nodes in the plan across network boundaries so that they can be reconstructed on the head node of the plan.

§broadcast_joins: bool

Enable broadcast joins for CollectLeft hash joins. When enabled, the build side of a CollectLeft join is broadcast to all consumer tasks. TODO: This option exists temporarily until we become smarter about when to actually use broadcasting like checking build side size. For now, broadcasting all CollectLeft joins is not always beneficial.

§compression: String

The compression used for sending data over the network between workers. It can be set to either zstd, lz4 or none.

§max_tasks_per_stage: usize

Maximum tasks that will be assigned per stage during distributed planning. If set to 0, this value is the number of workers returned by the provided WorkerResolver. It defaults to 0.

Implementations§

Source§

impl DistributedConfig

Source

pub fn with_task_estimator( self, task_estimator: impl TaskEstimator + Send + Sync + 'static, ) -> Self

Appends a TaskEstimator to the list. TaskEstimator will be executed sequentially in order on leaf nodes, and the first one to provide a value is the one that gets to decide how many tasks are used for that [Stage].

Source

pub fn from_config_options( cfg: &ConfigOptions, ) -> Result<&Self, DataFusionError>

Gets the DistributedConfig from the ConfigOptions’s extensions.

Source

pub fn from_config_options_mut( cfg: &mut ConfigOptions, ) -> Result<&mut Self, DataFusionError>

Gets the DistributedConfig from the ConfigOptions’s extensions.

Trait Implementations§

Source§

impl Clone for DistributedConfig

Source§

fn clone(&self) -> DistributedConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl ConfigExtension for DistributedConfig

Source§

const PREFIX: &'static str = "distributed"

Configuration namespace prefix to use Read more
Source§

impl ConfigField for DistributedConfig

Source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

Source§

fn visit<V: Visit>( &self, v: &mut V, _key_prefix: &str, _description: &'static str, )

Source§

fn reset(&mut self, key: &str) -> Result<(), DataFusionError>

Source§

impl Debug for DistributedConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for DistributedConfig

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl ExtensionOptions for DistributedConfig

Source§

fn as_any(&self) -> &dyn Any

Return self as Any Read more
Source§

fn as_any_mut(&mut self) -> &mut dyn Any

Return self as Any Read more
Source§

fn cloned(&self) -> Box<dyn ExtensionOptions>

Return a deep clone of this ExtensionOptions Read more
Source§

fn set(&mut self, key: &str, value: &str) -> Result<()>

Set the given key, value pair
Source§

fn entries(&self) -> Vec<ConfigEntry>

Returns the ConfigEntry stored in this ExtensionOptions

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> FromRef<T> for T
where T: Clone,

Source§

fn from_ref(input: &T) -> T

Converts to this type from a reference to the input type.
Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> IntoRequest<T> for T

Source§

fn into_request(self) -> Request<T>

Wrap the input message T in a tonic::Request
Source§

impl<L> LayerExt<L> for L

Source§

fn named_layer<S>(&self, service: S) -> Layered<<L as Layer<S>>::Service, S>
where L: Layer<S>,

Applies the layer to a service and wraps it in Layered.
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more