spark_connect/lib.rs
1/*!
2# spark-connect
3
4
5
6<b>An idiomatic, SQL-first Rust client for Apache Spark Connect.</b>
7
8This crate provides a fully asynchronous, strongly typed API for interacting
9with a remote Spark Connect server over gRPC.
10
11It allows you to build and execute SQL queries, bind parameters safely,
12and collect Arrow `RecordBatch` results - just like any other SQL toolkit -
13all in native Rust.
14
15## ✨ Features
16
17- ⚙️ **Spark-compatible connection builder** (`sc://host:port` format);
18- 🪶 **Async execution** using `tokio` and `tonic`;
19- 🧩 **Parameterized queries**;
20- 🧾 **Arrow-native results** returned as `Vec<RecordBatch>`;
21
22## Getting Started
23
24```
25use spark_connect::SparkSessionBuilder;
26
27# #[tokio::main]
28# async fn main() -> Result<(), Box<dyn std::error::Error>> {
29// 1️⃣ Connect to a Spark Connect endpoint
30let session = SparkSessionBuilder::new("sc://localhost:15002")
31 .build()
32 .await?;
33
34// 2️⃣ Execute a simple SQL query and receive a Vec<RecordBatches>
35let batches = session
36 .query("SELECT ? AS rule, ? AS text")
37 .bind(42)
38 .bind("world")
39 .execute()
40 .await?;
41
42# Ok(())
43# }
44```
45
46It's that simple!
47
48## 🧩 Parameterized Queries
49
50Behind the scenes, the [`SparkSession::query`] method
51uses the [`ToLiteral`] trait to safely bind parameters
52before execution:
53
54```ignore
55use spark_connect::ToLiteral;
56
57// This is
58
59let batches = session
60 .query("SELECT ? AS id, ? AS text")
61 .bind(42)
62 .bind("world")
63 .await?;
64
65// the same as this
66
67let lazy_plan = session.sql(
68 "SELECT ? AS id, ? AS text",
69 vec![42.to_literal(), "world".to_literal()]
70).await?;
71let batches = session.collect(lazy_plan);
72```
73
74## 😴 Lazy Execution
75
76The biggest advantage to using the [`sql()`](SparkSession::sql) method
77instead of [`query()`](SparkSession::query) is lazy execution -
78queries can be lazily evaluated and collected afterwards.
79If you're coming from PySpark or Scala, this should be the familiar interface.
80
81## 🧠 Concepts
82
83- <b>[`SparkSession`](crate::SparkSession)</b> — the main entry point for executing
84 SQL queries and managing a session.
85- <b>[`SparkClient`](crate::SparkClient)</b> — low-level gRPC client (used internally).
86- <b>[`SqlQueryBuilder`](crate::query::SqlQueryBuilder)</b> — helper for binding parameters
87 and executing queries.
88
89## ⚙️ Requirements
90
91- A running **Spark Connect server** (Spark 3.4+);
92- Network access to the configured `sc://` endpoint;
93- `tokio` runtime.
94
95## 🔒 Example Connection Strings
96
97```text
98sc://localhost:15002
99sc://spark-cluster:15002/?user_id=francisco
100sc://10.0.0.5:15002;session_id=abc123;user_agent=my-app
101```
102
103## 🏗️ Building With Different Versions of Spark Connect
104
105Currently, this crate is built against Spark 3.5.x. If you need to build against a different version of Spark Connect, you can:
106
1071. Clone this repository.
1082. Go to the [official Apache Spark repository](https://github.com/apache/spark/) and find the protobuf definitions for the desired version. Refer to the table below for the exact path.
1093. Download the `protobuf` directory and replace the `protobuf/` directory of this repository with the desired version.
1104. After replacing the files, run `cargo build` to regenerate the gRPC client code.
1115. Use the crate as usual.
112
113| Version | Path to the protobuf directory |
114|--------:|------------------|
115| 4.x | [`branch-4.x / sql/connect/common/src/main/protobuf`](https://github.com/apache/spark/tree/branch-4.1/sql/connect/common/src/main/protobuf) |
116| 3.4-3.5 | [`branch-3.x / connector/connect/common/src/main/protobuf`](https://github.com/apache/spark/tree/branch-3.5/connector/connect/common/src/main/protobuf) |
117
118⚠️ Note that compatibility is not guaranteed, and you may encounter issues if there are significant changes between versions.
119
120## 📘 Learn More
121
122- [Apache Spark Connect documentation](https://spark.apache.org/docs/latest/spark-connect.html);
123- [Apache Arrow RecordBatch specification](https://arrow.apache.org/docs/format/Columnar.html).
124
125## 🙏 Acknowledgements
126
127This project takes heavy inspiration from the [spark-connect-rs](https://github.com/sjrusso8/spark-connect-rs) project, and would've been much harder without it!
128
129---
130© 2025 Francisco A. B. Sampaio. Licensed under the MIT License.
131
132This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation.
133“Apache”, “Apache Spark”, and “Spark Connect” are trademarks of the Apache Software Foundation.
134*/
135
136mod io;
137pub mod client;
138mod error;
139mod literal;
140pub mod query;
141mod session;
142
143/// Spark Connect gRPC protobuf translated using [tonic].
144pub mod spark {
145 tonic::include_proto!("spark.connect");
146}
147
148pub use error::SparkError;
149pub use session::{SparkSessionBuilder, SparkSession};
150pub use literal::ToLiteral;
151
152#[cfg(test)]
153mod test_utils;