Crate spark_connect_rs

source ·
Expand description

Spark Connection Client for Rust

Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a “proof of concept” to identify the methods of interacting with Spark cluster from rust.

Usage

Create a Spark Session and create a DataFrame from a SQL statement:

async {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs".to_string())
        .build()
        .await?;

    let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`");

    df.filter("salary > 3000").show(Some(5), None, None).await?;
};

Create a Spark Session, create a DataFrame from a CSV file, and write the results:

async {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs".to_string())
        .build()
        .await?;

    let paths = vec!["/opt/spark/examples/src/main/resources/people.csv".to_string()];

    let mut df = spark
        .read()
        .format("csv")
        .option("header", "True")
        .option("delimiter", ";")
        .load(paths);

    let mut df = df
        .filter("age > 30")
        .select(vec!["name"]);

    df.write()
      .format("csv")
      .option("header", "true")
      .save("/opt/spark/examples/src/main/rust/people/")
      .await?;
};

Re-exports

Modules

  • DataFrame representation based on the Spark Connect gRPC protobuf
  • Create a Spark Session context from a Spark Connect remote source
  • Logical Plan represents the gRPC Spark Plan used to create a DataFrame
  • Spark Connect gRPC protobuf translated using tonic