Expand description
§spark-connect

An idiomatic, SQL-first Rust client for Apache Spark Connect.
This crate provides a fully asynchronous, strongly typed API for interacting with a remote Spark Connect server over gRPC.
It allows you to build and execute SQL queries, bind parameters safely,
and collect Arrow RecordBatch results - just like any other SQL toolkit -
all in native Rust.
§✨ Features
- ⚙️ Spark-compatible connection builder (
sc://host:portformat); - 🪶 Async execution using
tokioandtonic; - 🧩 Parameterized queries;
- 🧾 Arrow-native results returned as
Vec<RecordBatch>;
§Getting Started
use spark_connect::SparkSessionBuilder;
// 1️⃣ Connect to a Spark Connect endpoint
let session = SparkSessionBuilder::new("sc://localhost:15002")
.build()
.await?;
// 2️⃣ Execute a simple SQL query and receive a Vec<RecordBatches>
let batches = session
.query("SELECT ? AS rule, ? AS text")
.bind(42)
.bind("world")
.execute()
.await?;
It’s that simple!
§🧩 Parameterized Queries
Behind the scenes, the SparkSession::query method
uses the ToLiteral trait to safely bind parameters
before execution:
use spark_connect::ToLiteral;
// This is
let batches = session
.query("SELECT ? AS id, ? AS text")
.bind(42)
.bind("world")
.await?;
// the same as this
let lazy_plan = session.sql(
"SELECT ? AS id, ? AS text",
vec![42.to_literal(), "world".to_literal()]
).await?;
let batches = session.collect(lazy_plan);§😴 Lazy Execution
The biggest advantage to using the sql() method
instead of query() is lazy execution -
queries can be lazily evaluated and collected afterwards.
If you’re coming from PySpark or Scala, this should be the familiar interface.
§🧠 Concepts
SparkSession— the main entry point for executing SQL queries and managing a session.SparkClient— low-level gRPC client (used internally).SqlQueryBuilder— helper for binding parameters and executing queries.
§⚙️ Requirements
- A running Spark Connect server (Spark 3.4+);
- Network access to the configured
sc://endpoint; tokioruntime.
§🔒 Example Connection Strings
sc://localhost:15002
sc://spark-cluster:15002/?user_id=francisco
sc://10.0.0.5:15002;session_id=abc123;user_agent=my-app§🏗️ Building With Different Versions of Spark Connect
Currently, this crate is built against Spark 3.5.x. If you need to build against a different version of Spark Connect, you can:
- Clone this repository.
- Go to the official Apache Spark repository and find the protobuf definitions for the desired version. Refer to the table below for the exact path.
- Download the
protobufdirectory and replace theprotobuf/directory of this repository with the desired version. - After replacing the files, run
cargo buildto regenerate the gRPC client code. - Use the crate as usual.
| Version | Path to the protobuf directory |
|---|---|
| 4.x | branch-4.x / sql/connect/common/src/main/protobuf |
| 3.4-3.5 | branch-3.x / connector/connect/common/src/main/protobuf |
⚠️ Note that compatibility is not guaranteed, and you may encounter issues if there are significant changes between versions.
§📘 Learn More
§🙏 Acknowledgements
This project takes heavy inspiration from the spark-connect-rs project, and would’ve been much harder without it!
© 2025 Francisco A. B. Sampaio. Licensed under the MIT License.
This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. “Apache”, “Apache Spark”, and “Spark Connect” are trademarks of the Apache Software Foundation.
Modules§
- client
- Internal gRPC client abstraction for the Spark Connect protocol.
- query
- Provides support for building and executing parameterized SQL queries through
SparkSession::query. - spark
- Spark Connect gRPC protobuf translated using tonic.
Structs§
- Spark
Error - Wraps application errors into a common SparkError enum.
- Spark
Session - Represents a logical connection to a Spark Connect backend.
- Spark
Session Builder - Builder for creating
SparkSessioninstances.
Traits§
- ToLiteral
- A trait that allows automatic conversion of Rust primitives and complex types into Spark data types.