Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Robin Sparkless
PySpark-style DataFrames in Rust—no JVM. A DataFrame library that mirrors PySpark’s API and semantics while using Polars as the execution engine.
Why Robin Sparkless?
- Familiar API —
SparkSession,DataFrame,Column, and PySpark-like functions so you can reuse patterns without the JVM. - Polars under the hood — Fast, native Rust execution with Polars for IO, expressions, and aggregations.
- Persistence options — Global temp views (cross-session in-memory) and disk-backed
saveAsTableviaspark.sql.warehouse.dir. - Rust-first, Python optional — Use it as a Rust library or build the Python extension via PyO3 for a drop-in style API.
- Sparkless backend target — Designed to power Sparkless (the Python PySpark replacement) so Sparkless can run on this engine via PyO3.
Features
| Area | What’s included |
|---|---|
| Core | SparkSession, DataFrame, Column; filter, select, with_column, order_by, group_by, joins |
| IO | CSV, Parquet, JSON via SparkSession::read_* |
| Expressions | col(), lit(), when/then/otherwise, coalesce, cast, type/conditional helpers |
| Aggregates | count, sum, avg, min, max, and more; multi-column groupBy |
| Window | row_number, rank, dense_rank, lag, lead, first_value, last_value, and others with .over() |
| Arrays & maps | array_*, explode, create_map, map_keys, map_values, and related functions |
| Strings & JSON | String functions (upper, lower, substring, regexp_*, etc.), get_json_object, from_json, to_json |
| Datetime & math | Date/time extractors and arithmetic, year/month/day, math (sin, cos, sqrt, pow, …) |
| Optional SQL | spark.sql("SELECT ...") with temp views, global temp views (cross-session), and tables: createOrReplaceTempView, createOrReplaceGlobalTempView, table(name), table("global_temp.name"), df.write().saveAsTable(name, mode=...), spark.catalog().listTables() — enable with --features sql |
| Optional Delta | read_delta(path) or read_delta(table_name), read_delta_with_version, write_delta, write_delta_table(name) — enable with --features delta (path I/O); table-by-name works with sql only |
| UDFs | Scalar and vectorized Python UDFs via spark.udf().register(...), grouped vectorized pandas UDFs for group_by().agg(...) (function_type="grouped_agg"), and pure-Rust UDFs; see docs/UDF_GUIDE.md |
Parity: 200+ fixtures validated against PySpark. Known differences from PySpark are documented in docs/PYSPARK_DIFFERENCES.md. Out-of-scope items (XML, UDTF, streaming, RDD) are in docs/DEFERRED_SCOPE.md. Full parity status: docs/PARITY_STATUS.md.
Installation
Rust
Add to your Cargo.toml:
[]
= "0.8.4"
Optional features:
= { = "0.8.4", = ["sql"] } # spark.sql(), temp views
= { = "0.8.4", = ["delta"] } # Delta Lake read/write
Python (PyO3)
Install from PyPI (Python 3.8+):
Or build from source with maturin:
# With optional SQL and/or Delta:
Then use the robin_sparkless module; see docs/PYTHON_API.md.
Quick start
Rust
use ;
Output (from show):
shape: (2, 3)
┌─────┬─────┬─────────┐
│ id ┆ age ┆ name │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════════╡
│ 2 ┆ 30 ┆ Bob │
│ 3 ┆ 35 ┆ Charlie │
└─────┴─────┴─────────┘
You can also wrap an existing Polars DataFrame with DataFrame::from_polars(polars_df). See docs/QUICKSTART.md for joins, window functions, and more.
Python
=
=
= # or .gt(rs.lit(26))
Output:
[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]
Development
Prerequisites: Rust (see rust-toolchain.toml). For Python tests: Python 3.8+, maturin, pytest. For full check (lint + type-check): ruff, mypy (installed by Makefile when needed).
| Command | Description |
|---|---|
cargo build |
Build (Rust only) |
cargo build --features pyo3 |
Build with Python extension |
cargo test |
Run Rust tests |
make test |
Run Rust + Python tests (creates venv, maturin develop --features pyo3,sql,delta, pytest) |
make check |
Rust only: format check, clippy, audit, deny, Rust tests. Use make -j5 check to run the five jobs in parallel. |
make check-full |
Full CI: check + Python lint (ruff, mypy) + Python tests. Use make -j7 check-full to run all 7 jobs in parallel (5 Rust + 2 Python), or -j3 for the three top-level jobs. |
make fmt |
Format Rust code (run before check if you want to fix formatting). |
make test-parity-phase-a … make test-parity-phase-g |
Run parity fixtures for a specific phase (see PARITY_STATUS) |
make lint-python |
Python only: ruff format --check, ruff check, mypy |
cargo bench |
Benchmarks (robin-sparkless vs Polars) |
cargo doc --open |
Build and open API docs |
make gap-analysis |
PySpark gap analysis (clones Spark repo, extracts APIs, produces docs/GAP_ANALYSIS_PYSPARK_REPO.md) |
make gap-analysis-quick |
Quick gap analysis (uses existing pyspark_api_from_repo.json) |
CI runs format, clippy, audit, deny, Rust tests, Python lint (ruff, mypy), and Python tests on push/PR (see .github/workflows/ci.yml).
Documentation
| Resource | Description |
|---|---|
| Read the Docs | Full docs: quickstart, Python API, Sparkless integration (MkDocs) |
| docs.rs | Rust API reference |
| PyPI | Python package (wheels for Linux, macOS, Windows) |
| QUICKSTART | Build, usage, optional features, benchmarks |
| User Guide | Everyday usage (Rust and Python) |
| Persistence Guide | Global temp views, disk-backed saveAsTable |
| UDF Guide | Scalar, vectorized, and grouped UDFs |
| PySpark Differences | Known divergences |
| Rust–Python parity cross-check | Column/function binding parity (Rust vs Python) |
| Roadmap | Development phases, Sparkless integration |
| RELEASING | Publishing to crates.io |
See CHANGELOG.md for version history.
License
MIT