Robin Sparkless

PySpark-style DataFrames in Rust—no JVM. A DataFrame library that mirrors PySpark’s API and semantics while using Polars as the execution engine.

Why Robin Sparkless?

Familiar API — SparkSession, DataFrame, Column, and PySpark-like functions so you can reuse patterns without the JVM.
Polars under the hood — Fast, native Rust execution with Polars for IO, expressions, and aggregations.
Rust-first, Python optional — Use it as a Rust library or build the Python extension via PyO3 for a drop-in style API.
Sparkless backend target — Designed to power Sparkless (the Python PySpark replacement) so Sparkless can run on this engine via PyO3.

Features

Area	What’s included
Core	`SparkSession`, `DataFrame`, `Column`; `filter`, `select`, `with_column`, `order_by`, `group_by`, joins
IO	CSV, Parquet, JSON via `SparkSession::read_*`
Expressions	`col()`, `lit()`, `when`/`then`/`otherwise`, `coalesce`, cast, type/conditional helpers
Aggregates	`count`, `sum`, `avg`, `min`, `max`, and more; multi-column groupBy
Window	`row_number`, `rank`, `dense_rank`, `lag`, `lead`, `first_value`, `last_value`, and others with `.over()`
Arrays & maps	`array_*`, `explode`, `create_map`, `map_keys`, `map_values`, and related functions
Strings & JSON	String functions (`upper`, `lower`, `substring`, `regexp_*`, etc.), `get_json_object`, `from_json`, `to_json`
Datetime & math	Date/time extractors and arithmetic, `year`/`month`/`day`, math (`sin`, `cos`, `sqrt`, `pow`, …)
Optional SQL	`spark.sql("SELECT ...")` with temp views (`createOrReplaceTempView`, `table`) — enable with `--features sql`
Optional Delta	`read_delta`, `read_delta_with_version`, `write_delta` — enable with `--features delta`

Known differences from PySpark are documented in docs/PYSPARK_DIFFERENCES.md. Parity status and roadmap are in docs/PARITY_STATUS.md and docs/ROADMAP.md.

Installation

Rust

Add to your Cargo.toml:

[dependencies]
robin-sparkless = "0.1.0"

Optional features:

robin-sparkless = { version = "0.1.0", features = ["sql"] }   # spark.sql(), temp views
robin-sparkless = { version = "0.1.0", features = ["delta"] }  # Delta Lake read/write

Python (PyO3)

Install from PyPI (Python 3.8+):

pip install robin-sparkless

Or build from source with maturin:

pip install maturin
maturin develop --features pyo3
# With optional SQL and/or Delta:
maturin develop --features "pyo3,sql"
maturin develop --features "pyo3,delta"
maturin develop --features "pyo3,sql,delta"

Then use the robin_sparkless module; see docs/PYTHON_API.md.

Quick start

Rust

use robin_sparkless::{col, lit_i64, SparkSession};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let spark = SparkSession::builder().app_name("demo").get_or_create();

    // Create a DataFrame from rows (id, age, name)
    let df = spark.create_dataframe(
        vec![
            (1, 25, "Alice".to_string()),
            (2, 30, "Bob".to_string()),
            (3, 35, "Charlie".to_string()),
        ],
        vec!["id", "age", "name"],
    )?;

    // Filter and show
    let adults = df.filter(col("age").gt(lit_i64(26)))?;
    adults.show(Some(10))?;

    Ok(())
}

You can also wrap an existing Polars DataFrame with DataFrame::from_polars(polars_df). See docs/QUICKSTART.md for joins, window functions, and more.

Python

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe([(1, 25, "Alice"), (2, 30, "Bob")], ["id", "age", "name"])
filtered = df.filter(rs.col("age").gt(rs.lit(26)))
print(filtered.collect())  # [{"id": 2, "age": 30, "name": "Bob"}]

Development

Prerequisites: Rust (see rust-toolchain.toml), and for Python tests: Python 3.8+, maturin, pytest.

Command	Description
`cargo build`	Build (Rust only)
`cargo build --features pyo3`	Build with Python extension
`cargo test`	Run Rust tests
`make test`	Run Rust + Python tests (creates venv, `maturin develop`, `pytest`)
`make check`	Format, clippy, audit, deny, tests
`cargo bench`	Benchmarks (robin-sparkless vs Polars)
`cargo doc --open`	Build and open API docs

CI runs the same checks on push/PR (see .github/workflows/ci.yml).

Documentation

Full documentation (Read the Docs) — Quickstart, Python API, reference, and Sparkless integration (MkDocs)
PyPI — Python package (wheels for Linux, macOS, Windows)
API reference (docs.rs) — Crate API
QUICKSTART — Build, usage, optional features, benchmarks
ROADMAP — Development roadmap and Sparkless integration
PYSPARK_DIFFERENCES — Known divergences from PySpark
RELEASING — Releasing and publishing to crates.io

See also CHANGELOG.md for version history.

License

MIT

robin-sparkless 0.1.1

Robin Sparkless

Why Robin Sparkless?

Features

Installation

Rust

Python (PyO3)

Quick start

Rust

Python

Development

Documentation

License