1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
"""
Base classes and utilities for PySpark parity testing.
This module provides standard patterns for all parity tests that validate
Sparkless behavior against pre-generated PySpark expected outputs.
"""
"""Base class for all PySpark parity tests.
All parity tests should inherit from this class to ensure consistent
patterns and behavior across the test suite.
Note: Tests should use the unified `spark` fixture from conftest.py
which works with both sparkless and PySpark backends.
"""
"""Load expected output for a test case.
Args:
category: Test category (e.g., 'dataframe', 'functions', 'sql')
test_name: Name of the test case
pyspark_version: PySpark version to use (default: "3.2")
Returns:
Dictionary containing expected output data
"""
return
"""Assert that Sparkless result matches PySpark expected output.
Args:
actual_df: Sparkless DataFrame result
expected_output: Expected output dictionary from load_expected_output
tolerance: Numerical tolerance for floating point comparisons
msg: Optional custom error message
Raises:
AssertionError: If results don't match expected output
"""
"""Compare Sparkless result with PySpark expected output.
Args:
actual_df: Sparkless DataFrame result
expected_output: Expected output dictionary from load_expected_output
tolerance: Numerical tolerance for floating point comparisons
Returns:
Tuple of (is_equal, error_message)
"""
=
return True, None
return False,
"""Create a DataFrame from test data.
This is a convenience function for creating DataFrames in tests.
Args:
spark: Sparkless SparkSession
data: List of dictionaries representing rows
Returns:
DataFrame
"""
return