Crate spark_connect_rs

source ·
Expand description

Spark Connection Client for Rust

Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a “proof of concept” to identify the methods of interacting with Spark cluster from rust.


Create a Spark Session and create a DataFrame from a SQL statement:

async {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://;user_id=example_rs".to_string())

    let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`");

    df.filter("salary > 3000").show(Some(5), None, None).await?;

Create a Spark Session, create a DataFrame from a CSV file, and write the results:

async {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://;user_id=example_rs".to_string())

    let paths = vec!["/opt/spark/examples/src/main/resources/people.csv".to_string()];

    let mut df = spark
        .option("header", "True")
        .option("delimiter", ";")

    let mut df = df
        .filter("age > 30")

      .option("header", "true")



  • DataFrame representation based on the Spark Connect gRPC protobuf
  • Create a Spark Session context from a Spark Connect remote source
  • Logical Plan represents the gRPC Spark Plan used to create a DataFrame
  • Spark Connect gRPC protobuf translated using tonic