Module spark

Source
Expand description

Spark Connect gRPC protobuf translated using tonic

Modules§

add_artifacts_request
Nested message and enum types in AddArtifactsRequest.
add_artifacts_response
Nested message and enum types in AddArtifactsResponse.
aggregate
Nested message and enum types in Aggregate.
analyze_plan_request
Nested message and enum types in AnalyzePlanRequest.
analyze_plan_response
Nested message and enum types in AnalyzePlanResponse.
artifact_statuses_response
Nested message and enum types in ArtifactStatusesResponse.
catalog
Nested message and enum types in Catalog.
command
Nested message and enum types in Command.
common_inline_user_defined_function
Nested message and enum types in CommonInlineUserDefinedFunction.
common_inline_user_defined_table_function
Nested message and enum types in CommonInlineUserDefinedTableFunction.
config_request
Nested message and enum types in ConfigRequest.
data_type
Nested message and enum types in DataType.
execute_plan_request
Nested message and enum types in ExecutePlanRequest.
execute_plan_response
Nested message and enum types in ExecutePlanResponse.
expression
Nested message and enum types in Expression.
interrupt_request
Nested message and enum types in InterruptRequest.
join
Nested message and enum types in Join.
na_replace
Nested message and enum types in NAReplace.
parse
Nested message and enum types in Parse.
plan
Nested message and enum types in Plan.
read
Nested message and enum types in Read.
relation
Nested message and enum types in Relation.
release_execute_request
Nested message and enum types in ReleaseExecuteRequest.
set_operation
Nested message and enum types in SetOperation.
spark_connect_service_client
Generated client implementations.
stat_sample_by
Nested message and enum types in StatSampleBy.
streaming_foreach_function
Nested message and enum types in StreamingForeachFunction.
streaming_query_command
Nested message and enum types in StreamingQueryCommand.
streaming_query_command_result
Nested message and enum types in StreamingQueryCommandResult.
streaming_query_manager_command
Nested message and enum types in StreamingQueryManagerCommand.
streaming_query_manager_command_result
Nested message and enum types in StreamingQueryManagerCommandResult.
unpivot
Nested message and enum types in Unpivot.
write_operation
Nested message and enum types in WriteOperation.
write_operation_v2
Nested message and enum types in WriteOperationV2.
write_stream_operation_start
Nested message and enum types in WriteStreamOperationStart.

Structs§

AddArtifactsRequest
Request to transfer client-local artifacts.
AddArtifactsResponse
Response to adding an artifact. Contains relevant metadata to verify successful transfer of artifact(s).
Aggregate
Relation of type [Aggregate].
AnalyzePlanRequest
Request to perform plan analyze, optionally to explain the plan.
AnalyzePlanResponse
Response to performing analysis of the query. Contains relevant metadata to be able to reason about the performance.
ApplyInPandasWithState
ArtifactStatusesRequest
Request to get current statuses of artifacts at the server side.
ArtifactStatusesResponse
Response to checking artifact statuses.
CacheTable
See spark.catalog.cacheTable
CachedLocalRelation
A local relation that has been cached already.
CachedRemoteRelation
Represents a remote relation that has been cached on server.
CallFunction
Catalog
Catalog messages are marked as unstable.
ClearCache
See spark.catalog.clearCache
CoGroupMap
CollectMetrics
Collect arbitrary (named) metrics from a dataset.
Command
A [Command] is an operation that is executed by the server that does not directly consume or produce a relational result.
CommonInlineUserDefinedFunction
CommonInlineUserDefinedTableFunction
ConfigRequest
Request to update or fetch the configurations.
ConfigResponse
Response to the config request.
CreateDataFrameViewCommand
A command that can create DataFrame global temp view or local temp view.
CreateExternalTable
See spark.catalog.createExternalTable
CreateTable
See spark.catalog.createTable
CurrentCatalog
See spark.catalog.currentCatalog
CurrentDatabase
See spark.catalog.currentDatabase
DataType
This message describes the logical [DataType] of something. It does not carry the value itself but only describes it.
DatabaseExists
See spark.catalog.databaseExists
Deduplicate
Relation of type [Deduplicate] which have duplicate rows removed, could consider either only the subset of columns or all the columns.
Drop
Drop specified columns.
DropGlobalTempView
See spark.catalog.dropGlobalTempView
DropTempView
See spark.catalog.dropTempView
ExamplePluginCommand
ExamplePluginExpression
ExamplePluginRelation
ExecutePlanRequest
A request to be executed by the service.
ExecutePlanResponse
The response of a query, can be one or more for each request. Responses belonging to the same input query, carry the same session_id.
Expression
Expression used to refer to fields, functions and similar. This can be used everywhere expressions in SQL appear.
Filter
Relation that applies a boolean expression condition on each row of input to produce the output result.
FunctionExists
See spark.catalog.functionExists
GetDatabase
See spark.catalog.getDatabase
GetFunction
See spark.catalog.getFunction
GetResourcesCommand
Command to get the output of ‘SparkContext.resources’
GetResourcesCommandResult
Response for command ‘GetResourcesCommand’.
GetTable
See spark.catalog.getTable
GroupMap
Hint
Specify a hint over a relation. Hint should have a name and optional parameters.
HtmlString
Compose the string representing rows for output. It will invoke ‘Dataset.htmlString’ to compute the results.
InterruptRequest
InterruptResponse
IsCached
See spark.catalog.isCached
JavaUdf
Join
Relation of type [Join].
KeyValue
The key-value pair for the config request and response.
Limit
Relation of type [Limit] that is used to limit rows from the input relation.
ListCatalogs
See spark.catalog.listCatalogs
ListColumns
See spark.catalog.listColumns
ListDatabases
See spark.catalog.listDatabases
ListFunctions
See spark.catalog.listFunctions
ListTables
See spark.catalog.listTables
LocalRelation
A relation that does not need to be qualified by name.
MapPartitions
NaDrop
Drop rows containing null values. It will invoke ‘Dataset.na.drop’ (same as ‘DataFrameNaFunctions.drop’) to compute the results.
NaFill
Replaces null values. It will invoke ‘Dataset.na.fill’ (same as ‘DataFrameNaFunctions.fill’) to compute the results. Following 3 parameter combinations are supported: 1, ‘values’ only contains 1 item, ‘cols’ is empty: replaces null values in all type-compatible columns. 2, ‘values’ only contains 1 item, ‘cols’ is not empty: replaces null values in specified columns. 3, ‘values’ contains more than 1 items, then ‘cols’ is required to have the same length: replaces each specified column with corresponding value.
NaReplace
Replaces old values with the corresponding values. It will invoke ‘Dataset.na.replace’ (same as ‘DataFrameNaFunctions.replace’) to compute the results.
Offset
Relation of type [Offset] that is used to read rows staring from the offset on the input relation.
Parse
Plan
A [Plan] is the structure that carries the runtime information for the execution from the client to the server. A [Plan] can either be of the type [Relation] which is a reference to the underlying logical plan or it can be of the [Command] type that is used to execute commands on the server.
Project
Projection of a bag of expressions for a given input relation.
PythonUdf
PythonUdtf
Range
Relation of type [Range] that generates a sequence of integers.
Read
Relation that reads from a file / table or other data source. Does not have additional inputs.
ReattachExecuteRequest
ReattachOptions
RecoverPartitions
See spark.catalog.recoverPartitions
RefreshByPath
See spark.catalog.refreshByPath
RefreshTable
See spark.catalog.refreshTable
Relation
The main [Relation] type. Fundamentally, a relation is a typed container that has exactly one explicit relation type set.
RelationCommon
Common metadata of all relations.
ReleaseExecuteRequest
ReleaseExecuteResponse
Repartition
Relation repartition.
RepartitionByExpression
ResourceInformation
ResourceInformation to hold information about a type of Resource. The corresponding class is ‘org.apache.spark.resource.ResourceInformation’
Sample
Relation of type [Sample] that samples a fraction of the dataset.
ScalarScalaUdf
SetCurrentCatalog
See spark.catalog.setCurrentCatalog
SetCurrentDatabase
See spark.catalog.setCurrentDatabase
SetOperation
Relation of type [SetOperation]
ShowString
Compose the string representing rows for output. It will invoke ‘Dataset.showString’ to compute the results.
Sort
Relation of type [Sort].
Sql
Relation that uses a SQL query to generate the output.
SqlCommand
A SQL Command is used to trigger the eager evaluation of SQL commands in Spark.
StatApproxQuantile
Calculates the approximate quantiles of numerical columns of a DataFrame. It will invoke ‘Dataset.stat.approxQuantile’ (same as ‘StatFunctions.approxQuantile’) to compute the results.
StatCorr
Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. It will invoke ‘Dataset.stat.corr’ (same as ‘StatFunctions.pearsonCorrelation’) to compute the results.
StatCov
Calculate the sample covariance of two numerical columns of a DataFrame. It will invoke ‘Dataset.stat.cov’ (same as ‘StatFunctions.calculateCov’) to compute the results.
StatCrosstab
Computes a pair-wise frequency table of the given columns. Also known as a contingency table. It will invoke ‘Dataset.stat.crosstab’ (same as ‘StatFunctions.crossTabulate’) to compute the results.
StatDescribe
Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.
StatFreqItems
Finding frequent items for columns, possibly with false positives. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
StatSampleBy
Returns a stratified sample without replacement based on the fraction given on each stratum. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
StatSummary
Computes specified statistics for numeric and string columns. It will invoke ‘Dataset.summary’ (same as ‘StatFunctions.summary’) to compute the results.
StorageLevel
StorageLevel for persisting Datasets/Tables.
StreamingForeachFunction
StreamingQueryCommand
Commands for a streaming query.
StreamingQueryCommandResult
Response for commands on a streaming query.
StreamingQueryInstanceId
A tuple that uniquely identifies an instance of streaming query run. It consists of id that persists across the streaming runs and run_id that changes between each run of the streaming query that resumes from the checkpoint.
StreamingQueryManagerCommand
Commands for the streaming query manager.
StreamingQueryManagerCommandResult
Response for commands on the streaming query manager.
SubqueryAlias
Relation alias.
TableExists
See spark.catalog.tableExists
Tail
Relation of type [Tail] that is used to fetch limit rows from the last of the input relation.
ToDf
Rename columns on the input relation by the same length of names.
ToSchema
UncacheTable
See spark.catalog.uncacheTable
Unknown
Used for testing purposes only.
Unpivot
Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
UserContext
User Context is used to refer to one particular user session that is executing queries in the backend.
WithColumns
Adding columns or replacing the existing columns that have the same names.
WithColumnsRenamed
Rename columns on the input relation by a map with name to name mapping.
WithWatermark
WriteOperation
As writes are not directly handled during analysis and planning, they are modeled as commands.
WriteOperationV2
As writes are not directly handled during analysis and planning, they are modeled as commands.
WriteStreamOperationStart
Starts write stream operation as streaming query. Query ID and Run ID of the streaming query are returned.
WriteStreamOperationStartResult