robin-sparkless 4.2.0

PySpark-like DataFrame API in Rust on Polars; no JVM.
Documentation
# Parity Functions: Test Expectations and PySpark Alignment

## Test expectations match PySpark

Expected outputs under `tests/expected_outputs/` are generated from **PySpark** via `tests/tools/generate_expected_outputs.py`. They reflect PySpark behavior and are the source of truth for parity.

## Engine fixes applied (parity/functions)

- **initcap**: Implemented as a UDF that title-cases each word (first letter uppercase, rest lowercase) so behavior matches PySpark. Polars 0.53 has no `to_titlecase`.
- **xxhash64**: Uses seed **42** in `apply_xxhash64` to align with Spark’s XXH64 seed; hash values now match PySpark for the same inputs.
- **json_tuple**: Key arguments are coerced to strings when possible (e.g. `o.extract::<String>().unwrap_or_else(|_| o.to_string())`) so keys from plans or other types don’t raise "keys must be strings".

## Fixes applied (literal replication, explode, log)

- **Literal replication:** When every expression in `select()` references no column from the frame, the engine now cross-joins a single key column with the literal-derived result so the row count matches the input (N rows). Fixes `get_json_object`, `math_exp` (and similar literal-only selects).
- **Split + explode:** When `withColumn(name, F.explode(F.col(name)))` replaces a list column with its exploded form, the engine now uses `LazyFrame.explode()` so other columns are replicated correctly. Fixes `test_split_with_limit_parity`, `test_split_without_limit_parity`, `test_split_with_limit_minus_one_parity`.
- **log(float base):** Python `log(col_or_base, base_or_col=None)` now accepts PySpark’s `log(base, column)` and `log(column, base)`; one argument may be a numeric base (int/float), the other a Column. Fixes `test_log_with_float_base_parity`, `test_log_with_different_bases_parity`.

## Remaining parity failures (engine gaps)

These tests still fail because of current engine behavior; expectations are correct (PySpark-aligned).

| Area | Failure | Cause |
|------|---------|--------|
| **Literal replication (mixed)** | `levenshtein`, `json_tuple` (and similar) | Expression references a column (e.g. `levenshtein(col("name"), lit(""))`) so the “no column refs” path does not apply; or schema/column shape differs (e.g. `json_tuple`). |
| **Struct field alias** | `test_struct_field_with_alias_*` | Selecting struct fields with aliases returns `None` for the extracted value; needs struct/alias handling to match PySpark. |
| **Window orderBy list** | `test_window_orderby_list_multiple_columns_parity` | `Window.orderBy([list of columns])` ordering differs from PySpark. |
| **array_contains with column** | `test_array_contains_join_*` | `list.contains` with a column argument (join) fails: type/plan handling for "column in list" in join/filter. |
| **Array/column dtype** | Various `test_array_*` | Wrong column used (e.g. list op on string "tags"), output naming, or row counts (explode/array_union). |

Regenerating expected outputs from PySpark will not fix these; they require engine/plan/API changes in the Rust/Python layers.