Module spark_connect_rs::spark

source ·
Expand description

The spark-connect-rs crate is currently just a meta-package shim for spark-connect-core Spark Connect gRPC protobuf translated using tonic

Modules§

Structs§

  • Request to transfer client-local artifacts.
  • Response to adding an artifact. Contains relevant metadata to verify successful transfer of artifact(s).
  • Relation of type [Aggregate].
  • Request to perform plan analyze, optionally to explain the plan.
  • Response to performing analysis of the query. Contains relevant metadata to be able to reason about the performance.
  • Request to get current statuses of artifacts at the server side.
  • Response to checking artifact statuses.
  • See spark.catalog.cacheTable
  • A local relation that has been cached already.
  • Represents a remote relation that has been cached on server.
  • Catalog messages are marked as unstable.
  • See spark.catalog.clearCache
  • Collect arbitrary (named) metrics from a dataset.
  • A [Command] is an operation that is executed by the server that does not directly consume or produce a relational result.
  • Request to update or fetch the configurations.
  • Response to the config request.
  • A command that can create DataFrame global temp view or local temp view.
  • See spark.catalog.createExternalTable
  • See spark.catalog.createTable
  • See spark.catalog.currentCatalog
  • See spark.catalog.currentDatabase
  • This message describes the logical [DataType] of something. It does not carry the value itself but only describes it.
  • See spark.catalog.databaseExists
  • Relation of type [Deduplicate] which have duplicate rows removed, could consider either only the subset of columns or all the columns.
  • Drop specified columns.
  • See spark.catalog.dropGlobalTempView
  • See spark.catalog.dropTempView
  • A request to be executed by the service.
  • The response of a query, can be one or more for each request. Responses belonging to the same input query, carry the same session_id.
  • Expression used to refer to fields, functions and similar. This can be used everywhere expressions in SQL appear.
  • Relation that applies a boolean expression condition on each row of input to produce the output result.
  • See spark.catalog.functionExists
  • See spark.catalog.getDatabase
  • See spark.catalog.getFunction
  • Command to get the output of ‘SparkContext.resources’
  • Response for command ‘GetResourcesCommand’.
  • See spark.catalog.getTable
  • Specify a hint over a relation. Hint should have a name and optional parameters.
  • Compose the string representing rows for output. It will invoke ‘Dataset.htmlString’ to compute the results.
  • See spark.catalog.isCached
  • Relation of type [Join].
  • The key-value pair for the config request and response.
  • Relation of type [Limit] that is used to limit rows from the input relation.
  • See spark.catalog.listCatalogs
  • See spark.catalog.listColumns
  • See spark.catalog.listDatabases
  • See spark.catalog.listFunctions
  • See spark.catalog.listTables
  • A relation that does not need to be qualified by name.
  • Drop rows containing null values. It will invoke ‘Dataset.na.drop’ (same as ‘DataFrameNaFunctions.drop’) to compute the results.
  • Replaces null values. It will invoke ‘Dataset.na.fill’ (same as ‘DataFrameNaFunctions.fill’) to compute the results. Following 3 parameter combinations are supported: 1, ‘values’ only contains 1 item, ‘cols’ is empty: replaces null values in all type-compatible columns. 2, ‘values’ only contains 1 item, ‘cols’ is not empty: replaces null values in specified columns. 3, ‘values’ contains more than 1 items, then ‘cols’ is required to have the same length: replaces each specified column with corresponding value.
  • Replaces old values with the corresponding values. It will invoke ‘Dataset.na.replace’ (same as ‘DataFrameNaFunctions.replace’) to compute the results.
  • Relation of type [Offset] that is used to read rows staring from the offset on the input relation.
  • A [Plan] is the structure that carries the runtime information for the execution from the client to the server. A [Plan] can either be of the type [Relation] which is a reference to the underlying logical plan or it can be of the [Command] type that is used to execute commands on the server.
  • Projection of a bag of expressions for a given input relation.
  • Relation of type [Range] that generates a sequence of integers.
  • Relation that reads from a file / table or other data source. Does not have additional inputs.
  • See spark.catalog.recoverPartitions
  • See spark.catalog.refreshByPath
  • See spark.catalog.refreshTable
  • The main [Relation] type. Fundamentally, a relation is a typed container that has exactly one explicit relation type set.
  • Common metadata of all relations.
  • Relation repartition.
  • ResourceInformation to hold information about a type of Resource. The corresponding class is ‘org.apache.spark.resource.ResourceInformation’
  • Relation of type [Sample] that samples a fraction of the dataset.
  • See spark.catalog.setCurrentCatalog
  • See spark.catalog.setCurrentDatabase
  • Relation of type [SetOperation]
  • Compose the string representing rows for output. It will invoke ‘Dataset.showString’ to compute the results.
  • Relation of type [Sort].
  • Relation that uses a SQL query to generate the output.
  • A SQL Command is used to trigger the eager evaluation of SQL commands in Spark.
  • Calculates the approximate quantiles of numerical columns of a DataFrame. It will invoke ‘Dataset.stat.approxQuantile’ (same as ‘StatFunctions.approxQuantile’) to compute the results.
  • Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. It will invoke ‘Dataset.stat.corr’ (same as ‘StatFunctions.pearsonCorrelation’) to compute the results.
  • Calculate the sample covariance of two numerical columns of a DataFrame. It will invoke ‘Dataset.stat.cov’ (same as ‘StatFunctions.calculateCov’) to compute the results.
  • Computes a pair-wise frequency table of the given columns. Also known as a contingency table. It will invoke ‘Dataset.stat.crosstab’ (same as ‘StatFunctions.crossTabulate’) to compute the results.
  • Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.
  • Finding frequent items for columns, possibly with false positives. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
  • Returns a stratified sample without replacement based on the fraction given on each stratum. It will invoke ‘Dataset.stat.freqItems’ (same as ‘StatFunctions.freqItems’) to compute the results.
  • Computes specified statistics for numeric and string columns. It will invoke ‘Dataset.summary’ (same as ‘StatFunctions.summary’) to compute the results.
  • StorageLevel for persisting Datasets/Tables.
  • Commands for a streaming query.
  • Response for commands on a streaming query.
  • A tuple that uniquely identifies an instance of streaming query run. It consists of id that persists across the streaming runs and run_id that changes between each run of the streaming query that resumes from the checkpoint.
  • Commands for the streaming query manager.
  • Response for commands on the streaming query manager.
  • Relation alias.
  • See spark.catalog.tableExists
  • Relation of type [Tail] that is used to fetch limit rows from the last of the input relation.
  • Rename columns on the input relation by the same length of names.
  • See spark.catalog.uncacheTable
  • Used for testing purposes only.
  • Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
  • User Context is used to refer to one particular user session that is executing queries in the backend.
  • Adding columns or replacing the existing columns that have the same names.
  • Rename columns on the input relation by a map with name to name mapping.
  • As writes are not directly handled during analysis and planning, they are modeled as commands.
  • As writes are not directly handled during analysis and planning, they are modeled as commands.
  • Starts write stream operation as streaming query. Query ID and Run ID of the streaming query are returned.