Robin Sparkless
PySpark-style DataFrames in Rust—no JVM. A DataFrame library that mirrors PySpark’s API and semantics while using Polars as the execution engine.
Why Robin Sparkless?
- Familiar API —
SparkSession,DataFrame,Column, and PySpark-like functions so you can reuse patterns without the JVM. - Polars under the hood — Fast, native Rust execution with Polars for IO, expressions, and aggregations.
- Rust-first, Python optional — Use it as a Rust library or build the Python extension via PyO3 for a drop-in style API.
- Sparkless backend target — Designed to power Sparkless (the Python PySpark replacement) so Sparkless can run on this engine via PyO3.
Features
| Area | What’s included |
|---|---|
| Core | SparkSession, DataFrame, Column; filter, select, with_column, order_by, group_by, joins |
| IO | CSV, Parquet, JSON via SparkSession::read_* |
| Expressions | col(), lit(), when/then/otherwise, coalesce, cast, type/conditional helpers |
| Aggregates | count, sum, avg, min, max, and more; multi-column groupBy |
| Window | row_number, rank, dense_rank, lag, lead, first_value, last_value, and others with .over() |
| Arrays & maps | array_*, explode, create_map, map_keys, map_values, and related functions |
| Strings & JSON | String functions (upper, lower, substring, regexp_*, etc.), get_json_object, from_json, to_json |
| Datetime & math | Date/time extractors and arithmetic, year/month/day, math (sin, cos, sqrt, pow, …) |
| Optional SQL | spark.sql("SELECT ...") with temp views (createOrReplaceTempView, table) — enable with --features sql |
| Optional Delta | read_delta, read_delta_with_version, write_delta — enable with --features delta |
Known differences from PySpark are documented in docs/PYSPARK_DIFFERENCES.md. Parity status and roadmap are in docs/PARITY_STATUS.md and docs/ROADMAP.md.
Installation
Rust
Add to your Cargo.toml:
[]
= "0.1.0"
Optional features:
= { = "0.1.0", = ["sql"] } # spark.sql(), temp views
= { = "0.1.0", = ["delta"] } # Delta Lake read/write
Python (PyO3)
Install from PyPI (Python 3.8+):
Or build from source with maturin:
# With optional SQL and/or Delta:
Then use the robin_sparkless module; see docs/PYTHON_API.md.
Quick start
Rust
use ;
You can also wrap an existing Polars DataFrame with DataFrame::from_polars(polars_df). See docs/QUICKSTART.md for joins, window functions, and more.
Python
=
=
=
# [{"id": 2, "age": 30, "name": "Bob"}]
Development
Prerequisites: Rust (see rust-toolchain.toml), and for Python tests: Python 3.8+, maturin, pytest.
| Command | Description |
|---|---|
cargo build |
Build (Rust only) |
cargo build --features pyo3 |
Build with Python extension |
cargo test |
Run Rust tests |
make test |
Run Rust + Python tests (creates venv, maturin develop, pytest) |
make check |
Format, clippy, audit, deny, tests |
cargo bench |
Benchmarks (robin-sparkless vs Polars) |
cargo doc --open |
Build and open API docs |
CI runs the same checks on push/PR (see .github/workflows/ci.yml).
Documentation
- Full documentation (Read the Docs) — Quickstart, Python API, reference, and Sparkless integration (MkDocs)
- PyPI — Python package (wheels for Linux, macOS, Windows)
- API reference (docs.rs) — Crate API
- QUICKSTART — Build, usage, optional features, benchmarks
- ROADMAP — Development roadmap and Sparkless integration
- PYSPARK_DIFFERENCES — Known divergences from PySpark
- RELEASING — Releasing and publishing to crates.io
See also CHANGELOG.md for version history.
License
MIT