Crate spark_connect_rs

source ·
Expand description

Spark Connection Client for Rust

Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a “proof of concept” to identify the methods of interacting with Spark cluster from rust.

§Usage

Create a Spark Session and create a DataFrame from a SQL statement:

use spark_connect_rs::{SparkSession, SparkSessionBuilder};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs")
        .build()
        .await?;

    let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`").await?;

    df.filter("salary > 3000").show(Some(5), None, None).await?;

    Ok(())
};

Create a Spark Session, create a DataFrame from a CSV file, apply function transformations, and write the results:

use spark_connect_rs::{SparkSession, SparkSessionBuilder};

use spark_connect_rs::functions as F;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {

    let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs")
        .build()
        .await?;

    let paths = vec!["/opt/spark/examples/src/main/resources/people.csv".to_string()];

    let mut df = spark
        .read()
        .format("csv")
        .option("header", "True")
        .option("delimiter", ";")
        .load(paths);

    let mut df = df
        .filter("age > 30")
        .select(vec![
            F::col("name"),
            F::col("age").cast("int")
        ]);

    df.write()
      .format("csv")
      .option("header", "true")
      .save("/opt/spark/examples/src/main/rust/people/")
      .await?;

    Ok(())
};

Re-exports§

Modules§

  • A column in a DataFrame
  • DataFrame with Reader/Writer repesentation
  • A re-implementation of Spark functions
  • Logical Plan representation
  • DataFrameReader & DataFrameWriter representations
  • Spark Session containing the remote gRPC client
  • Spark Connect gRPC protobuf translated using tonic

Macros§