Skip to main content

Crate datafusion_distributed

Crate datafusion_distributed 

Source

Structs§

AvgLatencyMetric
BroadcastExec
ExecutionPlan that scales up partitions for network broadcasting.
BytesCounterMetric
A cumulative counter metric for tracking byte counts.
CompressionType
DefaultChannelResolver
Default implementation of a ChannelResolver that connects to the workers given the URL once and stores the connection instance in a TTI cache.
DefaultSessionBuilder
Noop implementation of the WorkerSessionBuilder. Used by default if no WorkerSessionBuilder is provided while building the Worker.
DistributedCodec
DataFusion PhysicalExtensionCodec implementation that allows serializing and deserializing the custom ExecutionPlans in this project
DistributedConfig
Configuration for the distributed planner.
DistributedExec
ExecutionPlan that executes the inner plan in distributed mode. Before executing it, two modifications are lazily performed on the plan:
DistributedLeafExec
Represents a leaf node ready to be distributed across N tasks, where the variant of the node belonging to each task is stored in a Vec of N positions.
DistributedTaskContext
DistributedWorkUnitFeedContext
Provides contextual information about where a WorkUnitFeedProvider is being executed. When using WorkUnitFeedProvider in distributed queries, it might be getting executed in the coordinating stage, or it might be getting executed just locally because the query did not need any remote execution.
FirstLatencyMetric
A latency metric that captures only the first recorded value, ignoring all subsequent ones. Uses 0 as the unset sentinel (valid durations are clamped to at least 1 nanosecond).
GetClusterWorkersRequest
GetClusterWorkersResponse
GetTaskProgressRequest
GetTaskProgressResponse
GetWorkerInfoRequest
GetWorkerInfoResponse
MappedWorkerSessionBuilder
MaxGaugeMetric
Similar to DataFusion’s Gauge metric, but aggregates between instances using max instead of sum.
MaxLatencyMetric
MinLatencyMetric
NetworkBroadcastExec
Network boundary for broadcasting data to all consumer tasks.
NetworkCoalesceExec
ExecutionPlan that coalesces partitions from multiple tasks into a one or more task without performing any repartition, and maintaining the same partitioning scheme.
NetworkShuffleExec
ExecutionPlan implementation that shuffles data across the network in a distributed context.
ObservabilityServiceClient
ObservabilityServiceImpl
ObservabilityServiceServer
P50LatencyMetric
P75LatencyMetric
P95LatencyMetric
P99LatencyMetric
PingRequest
PingResponse
TaskData
TaskData stores state for a single task being executed by this Endpoint. It may be shared by concurrent requests for the same task which execute separate partitions.
TaskEstimation
Result of running a TaskEstimator on a leaf node. It tells the distributed planner hints about how many tasks should be used in [Stage]s that contain leaf nodes.
TaskKey
A key that uniquely identifies a task in a query.
TaskProgress
Progress information for a single task
TaskRoutingContext
Context usable for routing tasks to worker URLs.
WorkUnitFeed
The WorkUnitFeed is created with a user-provided WorkUnitFeedProvider and is embedded in any custom datafusion::physical_plan::ExecutionPlan implementation as a field.
WorkUnitFeedProto
Worker
WorkerMetrics
Worker-level system metrics
WorkerQueryContext
WorkerServiceClient
WorkerServiceServer

Enums§

DistributedMetricsFormat
Format to use when displaying metrics for a distributed plan.
Stage
A unit of isolation for a portion of a physical execution plan that can be executed independently and across a network boundary. It implements ExecutionPlan and can be executed to produce a stream of record batches.
TaskCountAnnotation
Annotation attached to a single ExecutionPlan that determines how many distributed tasks it should run on.
TaskStatus

Constants§

DISTRIBUTED_DATAFUSION_TASK_ID_LABEL
Label used to annotate metrics in execution plan nodes with the task in which they were executed. Note that the same task id may be used in multiple stages.

Traits§

BytesMetricExt
Extension trait for DataFusion’s metric system that adds support for byte count metrics that display using human-readable byte sizes (KB, MB, GB) instead of plain count notation.
ChannelResolver
Allows users to customize the way Worker clients are created. A common use case is to wrap the client with tower layers or schedule it in an IO-specific tokio runtime.
DistributedExt
Extends DataFusion with distributed capabilities.
GaugeMetricExt
Extension trait for DataFusion’s metric system that adds support for a Gauge metric that aggregates to others using max instead of sum
LatencyMetricExt
Extension trait for DataFusion’s metric system that adds support for latency related metrics.
MappedWorkerSessionBuilderExt
NetworkBoundary
This trait represents a node that introduces the necessity of a network boundary in the plan. The distributed planner, upon stepping into one of these, will break the plan and build a stage out of it.
NetworkBoundaryExt
Extension trait for downcasting dynamic types to NetworkBoundary.
ObservabilityService
Generated trait containing gRPC methods that should be implemented for use with ObservabilityServiceServer.
SessionStateBuilderExt
Extension trait for SessionStateBuilder.
TaskEstimator
Given a leaf node, provides an estimation about how many tasks should be used in the stage containing it, and if the leaf node should be replaced by some other.
WorkUnit
A WorkUnit is a single unit of runtime metadata produced by a crate::WorkUnitFeedProvider and consumed by a leaf datafusion::physical_plan::ExecutionPlan via an embedded crate::WorkUnitFeed.
WorkUnitFeedProvider
Extension point for building user-defined work unit streams consumed by a crate::WorkUnitFeed embedded in a leaf datafusion::physical_plan::ExecutionPlan.
WorkerResolver
Resolves a list of worker URLs in the cluster available for executing parts of the plan.
WorkerSessionBuilder
builds a DataFusion’s SessionState in each query issued to a worker.

Functions§

create_worker_client
Creates a WorkerServiceClient with high default message size limits.
display_plan_ascii
display_plan_graphviz
This will render a regular or distributed datafusion plan as Graphviz dot format. You can view them on https://vis-js.com
explain_analyze
explain_analyze renders an ExecutionPlan with metrics.
get_distributed_channel_resolver
get_distributed_worker_resolver
rewrite_distributed_plan_with_metrics
Rewrites a distributed plan with metrics. Does nothing if the root node is not a DistributedExec. Returns an error if the distributed plan was not executed.

Type Aliases§

BoxCloneSyncChannel