Crate spark_connect_rs
source ·Expand description
Spark Connection Client for Rust
Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a “proof of concept” to identify the methods of interacting with Spark cluster from rust.
§Usage
Create a Spark Session and create a DataFrame from a SQL statement:
use spark_connect_rs::{SparkSession, SparkSessionBuilder};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs")
.build()
.await?;
let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`").await?;
df.filter("salary > 3000").show(Some(5), None, None).await?;
Ok(())
};Create a Spark Session, create a DataFrame from a CSV file, apply function transformations, and write the results:
use spark_connect_rs::{SparkSession, SparkSessionBuilder};
use spark_connect_rs::functions as F;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs")
.build()
.await?;
let paths = vec!["/opt/spark/examples/src/main/resources/people.csv".to_string()];
let mut df = spark
.read()
.format("csv")
.option("header", "True")
.option("delimiter", ";")
.load(paths);
let mut df = df
.filter("age > 30")
.select(vec![
F::col("name"),
F::col("age").cast("int")
]);
df.write()
.format("csv")
.option("header", "true")
.save("/opt/spark/examples/src/main/rust/people/")
.await?;
Ok(())
};Re-exports§
pub use dataframe::DataFrame;pub use dataframe::DataFrameReader;pub use dataframe::DataFrameWriter;pub use session::SparkSession;pub use session::SparkSessionBuilder;pub use arrow;
Modules§
- A column in a DataFrame
- DataFrame with Reader/Writer repesentation
- A re-implementation of Spark functions
- Logical Plan representation
- DataFrameReader & DataFrameWriter representations
- Spark Session containing the remote gRPC client
- Spark Connect gRPC protobuf translated using tonic