robin-sparkless 4.3.0

PySpark-like DataFrame API in Rust on Polars; no JVM.
Documentation
# Migration to Rust-Only with Polars - Status

## ✅ Completed

1. **Removed Python/PyO3 dependencies**
   - Removed `pyproject.toml`
   - Removed `src/robin_sparkless/` Python package directory
   - Removed Python tests
   - Updated `Cargo.toml` to remove PyO3 and add Polars

2. **Replaced DataFusion with Polars**
   - Updated all modules to use Polars instead of DataFusion
   - Removed `arrow_conversion.rs` (no longer needed)
   - Removed `lazy.rs` (using Polars LazyFrame directly)

3. **Created Rust-only API**
   - `lib.rs`: Public Rust API exports
   - `session.rs`: SparkSession with Polars backend
   - `dataframe.rs`: DataFrame using Polars LazyFrame
   - `column.rs`: Column using Polars Expr
   - `functions.rs`: Helper functions using Polars
   - `expression.rs`: Expression utilities
   - `schema.rs`: Schema conversion for Polars

4. **Updated documentation**
   - `README.md`: Reflects Rust-only project with Polars

## 🔧 Remaining Work

The migration itself is complete (Rust-only + Polars backend, build/test green). Optional features are implemented:

- **SQL** (optional `sql` feature): `SparkSession::sql()`, temp views, in-memory `saveAsTable`/`write_delta_table`, catalog `listTables`/`dropTable`, `read_delta(name_or_path)`; see [QUICKSTART.md]QUICKSTART.md.
- **Delta Lake** (optional `delta` feature): `read_delta`, `read_delta_with_version`, `write_delta`.
- **Benchmarks**: `cargo bench` (robin vs Polars); target within ~2x.

Remaining work is parity and feature expansion:

- Broader function coverage: Phase 6 array_position, array_remove, posexplode **implemented**; cume_dist, ntile, nth_value API (fixtures covered via multi-step workaround). **Phase 8 completed**: array_repeat, array_flatten, Map (create_map, map_keys, map_values, map_entries, map_from_arrays), String 6.4 (soundex, levenshtein, crc32, xxhash64). JSON (get_json_object, from_json, to_json) implemented. Additional edge-case parity fixtures.
- **Path to 100%** (ROADMAP Phases 16–27): Phases 18–25 completed (~283 functions, 159 fixtures, plan interpreter). Phase 26 (publish Rust crate on crates.io), Phase 27 (Sparkless integration, 200+ tests). See [ROADMAP.md]ROADMAP.md and [FULL_BACKEND_ROADMAP.md]FULL_BACKEND_ROADMAP.md.

**Sparkless integration**: Robin-sparkless is designed to replace the backend of [Sparkless]https://github.com/eddiethedean/sparkless. See [SPARKLESS_INTEGRATION_ANALYSIS.md]SPARKLESS_INTEGRATION_ANALYSIS.md for phases: fixture converter, structural alignment, function parity, and test conversion.

## Architecture

The new architecture:
- **SparkSession**: Entry point, uses Polars for file I/O
- **DataFrame**: Wraps Polars LazyFrame/DataFrame, provides PySpark-like API
- **Column**: Wraps Polars Expr, provides column operations
- **Functions**: Helper functions that return Polars expressions
- **Schema**: Converts between Polars schemas and custom schema types

All operations are lazy by default (using Polars LazyFrame) and execute when actions like `collect()` or `show()` are called.

## Historical Notes (archived)

Earlier versions of this doc tracked Polars API mismatches and compilation errors during the migration. Those items are no longer current; parity coverage is tracked in `PARITY_STATUS.md` and future work in `ROADMAP.md`.