Crate spark_connect_rs
source ·Expand description
Spark Connection Client for Rust
Currently, the Spark Connect client for Rust is highly experimental and should not be used in any production setting. This is currently a “proof of concept” to identify the methods of interacting with Spark cluster from rust.
Usage
Create a Spark Session and create a DataFrame from a SQL statement:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs".to_string())
.build()
.await?;
let mut df = spark.sql("SELECT * FROM json.`/opt/spark/examples/src/main/resources/employees.json`");
df.filter("salary > 3000").show(Some(5), None, None).await?;
Ok(())
};
Create a Spark Session, create a DataFrame from a CSV file, and write the results:
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let spark: SparkSession = SparkSessionBuilder::remote("sc://127.0.0.1:15002/;user_id=example_rs".to_string())
.build()
.await?;
let paths = vec!["/opt/spark/examples/src/main/resources/people.csv".to_string()];
let mut df = spark
.read()
.format("csv")
.option("header", "True")
.option("delimiter", ";")
.load(paths);
let mut df = df
.filter("age > 30")
.select(vec!["name"]);
df.write()
.format("csv")
.option("header", "true")
.save("/opt/spark/examples/src/main/rust/people/")
.await?;
Ok(())
};
Re-exports
pub use dataframe::DataFrame;
pub use dataframe::DataFrameReader;
pub use dataframe::DataFrameWriter;
pub use execution::context::SparkSession;
pub use execution::context::SparkSessionBuilder;
pub use plan::LogicalPlanBuilder;
pub use arrow;
Modules
- DataFrame representation based on the Spark Connect gRPC protobuf
- Create a Spark Session context from a Spark Connect remote source
- Logical Plan represents the gRPC Spark Plan used to create a DataFrame
- Spark Connect gRPC protobuf translated using tonic