Module spark

Source

Expand description

Spark Connect gRPC protobuf translated using tonic

Modules§

add_artifacts_request: Nested message and enum types in AddArtifactsRequest.
add_artifacts_response: Nested message and enum types in AddArtifactsResponse.
aggregate: Nested message and enum types in Aggregate.
analyze_plan_request: Nested message and enum types in AnalyzePlanRequest.
analyze_plan_response: Nested message and enum types in AnalyzePlanResponse.
artifact_statuses_response: Nested message and enum types in ArtifactStatusesResponse.
catalog: Nested message and enum types in Catalog.
command: Nested message and enum types in Command.
common_inline_user_defined_function: Nested message and enum types in CommonInlineUserDefinedFunction.
common_inline_user_defined_table_function: Nested message and enum types in CommonInlineUserDefinedTableFunction.
config_request: Nested message and enum types in ConfigRequest.
data_type: Nested message and enum types in DataType.
execute_plan_request: Nested message and enum types in ExecutePlanRequest.
execute_plan_response: Nested message and enum types in ExecutePlanResponse.
expression: Nested message and enum types in Expression.
interrupt_request: Nested message and enum types in InterruptRequest.
join: Nested message and enum types in Join.
na_replace: Nested message and enum types in NAReplace.
parse: Nested message and enum types in Parse.
plan: Nested message and enum types in Plan.
read: Nested message and enum types in Read.
relation: Nested message and enum types in Relation.
release_execute_request: Nested message and enum types in ReleaseExecuteRequest.
set_operation: Nested message and enum types in SetOperation.
spark_connect_service_client: Generated client implementations.
stat_sample_by: Nested message and enum types in StatSampleBy.
streaming_foreach_function: Nested message and enum types in StreamingForeachFunction.
streaming_query_command: Nested message and enum types in StreamingQueryCommand.
streaming_query_command_result: Nested message and enum types in StreamingQueryCommandResult.
streaming_query_manager_command: Nested message and enum types in StreamingQueryManagerCommand.
streaming_query_manager_command_result: Nested message and enum types in StreamingQueryManagerCommandResult.
unpivot: Nested message and enum types in Unpivot.
write_operation: Nested message and enum types in WriteOperation.
write_operation_v2: Nested message and enum types in WriteOperationV2.
write_stream_operation_start: Nested message and enum types in WriteStreamOperationStart.

Structs§

AddArtifactsRequest: Request to transfer client-local artifacts.
AddArtifactsResponse: Response to adding an artifact. Contains relevant metadata to verify successful transfer of artifact(s).
Aggregate: Relation of type [Aggregate].
AnalyzePlanRequest: Request to perform plan analyze, optionally to explain the plan.
AnalyzePlanResponse: Response to performing analysis of the query. Contains relevant metadata to be able to reason about the performance.
ApplyInPandasWithState
ArtifactStatusesRequest: Request to get current statuses of artifacts at the server side.
ArtifactStatusesResponse: Response to checking artifact statuses.
CacheTable: See spark.catalog.cacheTable
CachedLocalRelation: A local relation that has been cached already.
CachedRemoteRelation: Represents a remote relation that has been cached on server.
CallFunction
Catalog: Catalog messages are marked as unstable.
ClearCache: See spark.catalog.clearCache
CoGroupMap
CollectMetrics: Collect arbitrary (named) metrics from a dataset.
Command: A [Command] is an operation that is executed by the server that does not directly consume or produce a relational result.
CommonInlineUserDefinedFunction
CommonInlineUserDefinedTableFunction
ConfigRequest: Request to update or fetch the configurations.
ConfigResponse: Response to the config request.
CreateDataFrameViewCommand: A command that can create DataFrame global temp view or local temp view.
CreateExternalTable: See spark.catalog.createExternalTable
CreateTable: See spark.catalog.createTable
CurrentCatalog: See spark.catalog.currentCatalog
CurrentDatabase: See spark.catalog.currentDatabase
DataType: This message describes the logical [DataType] of something. It does not carry the value itself but only describes it.
DatabaseExists: See spark.catalog.databaseExists
Deduplicate: Relation of type [Deduplicate] which have duplicate rows removed, could consider either only the subset of columns or all the columns.
Drop: Drop specified columns.
DropGlobalTempView: See spark.catalog.dropGlobalTempView
DropTempView: See spark.catalog.dropTempView
ExamplePluginCommand
ExamplePluginExpression
ExamplePluginRelation
ExecutePlanRequest: A request to be executed by the service.
ExecutePlanResponse: The response of a query, can be one or more for each request. Responses belonging to the same input query, carry the same session_id.
Expression: Expression used to refer to fields, functions and similar. This can be used everywhere expressions in SQL appear.
Filter: Relation that applies a boolean expression condition on each row of input to produce the output result.
FunctionExists: See spark.catalog.functionExists
GetDatabase: See spark.catalog.getDatabase
GetFunction: See spark.catalog.getFunction
GetResourcesCommand: Command to get the output of ‘SparkContext.resources’
GetResourcesCommandResult: Response for command ‘GetResourcesCommand’.
GetTable: See spark.catalog.getTable
GroupMap
Hint: Specify a hint over a relation. Hint should have a name and optional parameters.
HtmlString: Compose the string representing rows for output. It will invoke ‘Dataset.htmlString’ to compute the results.
InterruptRequest
InterruptResponse
IsCached: See spark.catalog.isCached
JavaUdf
Join: Relation of type [Join].
KeyValue: The key-value pair for the config request and response.
Limit: Relation of type [Limit] that is used to limit rows from the input relation.
ListCatalogs: See spark.catalog.listCatalogs
ListColumns: See spark.catalog.listColumns
ListDatabases: See spark.catalog.listDatabases
ListFunctions: See spark.catalog.listFunctions
ListTables: See spark.catalog.listTables
LocalRelation: A relation that does not need to be qualified by name.
MapPartitions
NaDrop: Drop rows containing null values. It will invoke ‘Dataset.na.drop’ (same as ‘DataFrameNaFunctions.drop’) to compute the results.
NaFill: Replaces null values. It will invoke ‘Dataset.na.fill’ (same as ‘DataFrameNaFunctions.fill’) to compute the results. Following 3 parameter combinations are supported: 1, ‘values’ only contains 1 item, ‘cols’ is empty: replaces null values in all type-compatible columns. 2, ‘values’ only contains 1 item, ‘cols’ is not empty: replaces null values in specified columns. 3, ‘values’ contains more than 1 items, then ‘cols’ is required to have the same length: replaces each specified column with corresponding value.
NaReplace: Replaces old values with the corresponding values. It will invoke ‘Dataset.na.replace’ (same as ‘DataFrameNaFunctions.replace’) to compute the results.
Offset: Relation of type [Offset] that is used to read rows staring from the offset on the input relation.
Parse
Plan: A [Plan] is the structure that carries the runtime information for the execution from the client to the server. A [Plan] can either be of the type [Relation] which is a reference to the underlying logical plan or it can be of the [Command] type that is used to execute commands on the server.
Project: Projection of a bag of expressions for a given input relation.
PythonUdf
PythonUdtf
Range: Relation of type [Range] that generates a sequence of integers.
Read: Relation that reads from a file / table or other data source. Does not have additional inputs.
ReattachExecuteRequest
ReattachOptions
RecoverPartitions: See spark.catalog.recoverPartitions
RefreshByPath: See spark.catalog.refreshByPath
RefreshTable: See spark.catalog.refreshTable
Relation: The main [Relation] type. Fundamentally, a relation is a typed container that has exactly one explicit relation type set.
RelationCommon: Common metadata of all relations.
ReleaseExecuteRequest
ReleaseExecuteResponse
Repartition: Relation repartition.
RepartitionByExpression
ResourceInformation: ResourceInformation to hold information about a type of Resource. The corresponding class is ‘org.apache.spark.resource.ResourceInformation’
Sample: Relation of type [Sample] that samples a fraction of the dataset.
ScalarScalaUdf
SetCurrentCatalog: See spark.catalog.setCurrentCatalog
SetCurrentDatabase: See spark.catalog.setCurrentDatabase
SetOperation: Relation of type [SetOperation]
ShowString: Compose the string representing rows for output. It will invoke ‘Dataset.showString’ to compute the results.
Sort: Relation of type [Sort].
Sql: Relation that uses a SQL query to generate the output.
SqlCommand: A SQL Command is used to trigger the eager evaluation of SQL commands in Spark.
StatApproxQuantile: Calculates the approximate quantiles of numerical columns of a DataFrame. It will invoke ‘Dataset.stat.approxQuantile’ (same as ‘StatFunctions.approxQuantile’) to compute the results.
StatCorr: Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. It will invoke ‘Dataset.stat.corr’ (same as ‘StatFunctions.pearsonCorrelation’) to compute the results.
StatCov: Calculate the sample covariance of two numerical columns of a DataFrame. It will invoke ‘Dataset.stat.cov’ (same as ‘StatFunctions.calculateCov’) to compute the results.
StatCrosstab: Computes a pair-wise frequency table of the given columns. Also known as a contingency table. It will invoke ‘Dataset.stat.crosstab’ (same as ‘StatFunctions.crossTabulate’) to compute the results.
StatDescribe: Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.
StatFreqItems: Finding frequent items for columns, possibly with false positives. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
StatSampleBy: Returns a stratified sample without replacement based on the fraction given on each stratum. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
StatSummary: Computes specified statistics for numeric and string columns. It will invoke ‘Dataset.summary’ (same as ‘StatFunctions.summary’) to compute the results.
StorageLevel: StorageLevel for persisting Datasets/Tables.
StreamingForeachFunction
StreamingQueryCommand: Commands for a streaming query.
StreamingQueryCommandResult: Response for commands on a streaming query.
StreamingQueryInstanceId: A tuple that uniquely identifies an instance of streaming query run. It consists of id that persists across the streaming runs and run_id that changes between each run of the streaming query that resumes from the checkpoint.
StreamingQueryManagerCommand: Commands for the streaming query manager.
StreamingQueryManagerCommandResult: Response for commands on the streaming query manager.
SubqueryAlias: Relation alias.
TableExists: See spark.catalog.tableExists
Tail: Relation of type [Tail] that is used to fetch limit rows from the last of the input relation.
ToDf: Rename columns on the input relation by the same length of names.
ToSchema
UncacheTable: See spark.catalog.uncacheTable
Unknown: Used for testing purposes only.
Unpivot: Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
UserContext: User Context is used to refer to one particular user session that is executing queries in the backend.
WithColumns: Adding columns or replacing the existing columns that have the same names.
WithColumnsRenamed: Rename columns on the input relation by a map with name to name mapping.
WithWatermark
WriteOperation: As writes are not directly handled during analysis and planning, they are modeled as commands.
WriteOperationV2: As writes are not directly handled during analysis and planning, they are modeled as commands.
WriteStreamOperationStart: Starts write stream operation as streaming query. Query ID and Run ID of the streaming query are returned.
WriteStreamOperationStartResult

Module sparkCopy item path

Modules§

Structs§

Module spark