robin-sparkless 4.3.0

## Integration Test Guide (Python)

This guide describes how to write and maintain **Python integration tests** under `tests/integration` so they:

- Use the **shared test harness** (fixtures + backend abstraction).
- Treat **PySpark behavior as the source of truth**.
- Work against both the **PySpark** backend and the **Robin** backend without changing test code.

---

### 1. Use the shared harness

- **Do NOT import PySpark or sparkless directly** in tests.
- Import through the shared helpers:

  ```python
  from tests.fixtures.spark_imports import get_spark_imports
  ```

- Use the `spark` fixture from `tests/conftest.py` for the default session (backend chosen via `SPARKLESS_TEST_BACKEND` / `MOCK_SPARK_TEST_BACKEND` or `@pytest.mark.backend(...)`).
- **Do not** construct `SparkSession` manually (e.g. `SparkSession("TestApp")` or `SparkSession.builder...getOrCreate()`) except via fixtures; never call `spark.stop()` in tests.

---

### 2. Sessions with custom config

For tests that need a session with specific config (e.g. `spark.sql.caseSensitive=true`), use a dedicated fixture instead of building a session inside the test:

- **Default session**: use the `spark` fixture (case-insensitive by default on both backends).
- **Case-sensitive session**: use the `spark_case_sensitive` fixture provided in `tests/integration/test_case_sensitivity.py` (or a similar fixture in conftest). It creates a session with `spark.sql.caseSensitive=true`, yields it, and stops it after the test.

Example:

```python
def test_case_sensitive_mode(self, spark_case_sensitive):
    spark = spark_case_sensitive
    df = spark.createDataFrame([{"Name": "Alice"}])
    with pytest.raises(Exception):
        df.select("name").collect()
```

---

### 3. Backend-agnostic Row and conf

- **Row field names / values**: Do not rely on backend-specific Row internals (e.g. `row.__dict__["_data_dict"]` or `row._schema.fields`). Use a helper that works on both backends:
  - `row.asDict()` (PySpark) or `dict(row)` / `_row_to_dict(row)` (see `tests/integration/test_case_sensitivity.py`).
  - For column names only: `list(_row_to_dict(row).keys())` or a `_row_keys(row)` helper.
- **Case sensitivity config**: PySpark’s `RuntimeConfig` may not have `is_case_sensitive()`. Use a helper that checks `conf.get("spark.sql.caseSensitive", "false").strip().lower() == "true"` when the method is missing (see `_is_case_sensitive(spark)` in the case sensitivity tests).

---

### 4. Backend-specific behavior

When behavior differs by backend (e.g. PySpark raises `AMBIGUOUS_REFERENCE` for ambiguous column names, while mock may return a first match), write the test so both outcomes are accepted:

- Use try/except: assert the “lenient” outcome when the operation succeeds, and allow the known exception (e.g. message containing `"ambiguous"`) when the backend is strict.
- Or use `@pytest.mark.backend("pyspark")` / `get_backend_type()` and skip or xfail on the other backend, and document the divergence.

---

### 5. Running integration tests

- **PySpark backend** (parity / expectation updates):

  ```bash
  SPARKLESS_TEST_BACKEND=pyspark pytest tests/integration -n 10 -v
  ```

- **Robin backend** (default):

  ```bash
  pytest tests/integration -n 10 -v
  ```

Use a 15-minute timeout for full runs if needed.

---

### 6. References

- Harness: `tests/conftest.py`, `tests/fixtures/spark_backend.py`, `tests/fixtures/spark_imports.py`
- Example: `tests/integration/test_case_sensitivity.py` (fixtures `spark`, `spark_case_sensitive`; helpers `_row_to_dict`, `_row_keys`, `_is_case_sensitive`)