Module functions

Source

Structs§

ChainedWhenBuilder: Builder for an additional when-then clause (returned by ThenBuilder::when).
SortOrder: Sort order specification for use in orderBy/sort. Holds expr + direction + null placement.
ThenBuilder: Builder for chaining when-then clauses before finalizing with otherwise
WhenBuilder: Builder for when-then-otherwise expressions

Functions§

abs: Absolute value (PySpark abs)
acos: Arc cosine (PySpark acos)
acosh: Inverse hyperbolic cosine (PySpark acosh).
add_months: Add n months to date column (PySpark add_months).
aes_decrypt: AES decrypt (PySpark aes_decrypt). Input hex(nonce||ciphertext).
aes_encrypt: AES encrypt (PySpark aes_encrypt). Key as string; AES-128-GCM.
aggregate: Array fold/aggregate (PySpark aggregate). Simplified: zero + sum(list elements).
any_value: Any value from the group (PySpark any_value). Use in groupBy.agg(). ignorenulls reserved for API compatibility.
approx_count_distinct: Approximate count distinct (PySpark approx_count_distinct). Use in groupBy.agg(). rsd reserved for API compatibility; Polars uses exact n_unique.
approx_percentile: Approximate percentile (PySpark approx_percentile). Maps to quantile; percentage in 0.0..=1.0. accuracy reserved for API compatibility.
array: Create an array column from multiple columns (PySpark array). With no arguments, returns a column of empty arrays (one per row); PySpark parity.
array_agg: Collect to array (PySpark array_agg).
array_append: Append element to end of list (PySpark array_append).
array_compact: Remove null elements from list (PySpark array_compact).
array_contains: Check if list contains value (PySpark array_contains).
array_distinct: Distinct elements in list (PySpark array_distinct).
array_except: Elements in first array not in second (PySpark array_except).
array_exists: True if any list element satisfies the predicate (PySpark exists).
array_filter: Filter list elements by predicate (PySpark filter).
array_flatten: Flatten list of lists to one list (PySpark flatten). Not implemented.
array_forall: True if all list elements satisfy the predicate (PySpark forall).
array_insert: Insert element at 1-based position (PySpark array_insert).
array_intersect: Elements in both arrays (PySpark array_intersect).
array_join: Join list of strings with separator (PySpark array_join).
array_max: Maximum element in list (PySpark array_max).
array_mean: Mean of list elements (PySpark aggregate avg).
array_min: Minimum element in list (PySpark array_min).
array_position: 1-based index of first occurrence of value in list, or 0 if not found (PySpark array_position). Implemented via Polars list.eval with col(“”) as element.
array_prepend: Prepend element to start of list (PySpark array_prepend).
array_remove: New list with all elements equal to value removed (PySpark array_remove). Implemented via Polars list.eval + list.drop_nulls.
array_repeat: Repeat each element n times (PySpark array_repeat). Not implemented: would require list.eval with dynamic repeat.
array_size: Number of elements in list (PySpark size / array_size). Returns Int32.
array_slice: Slice list from 1-based start with optional length (PySpark slice).
array_sort: Sort list elements (PySpark array_sort).
array_sum: Sum of list elements (PySpark aggregate sum).
array_transform: Transform list elements by expression (PySpark transform).
array_union: Distinct elements from both arrays (PySpark array_union).
arrays_overlap: True if two arrays have any element in common (PySpark arrays_overlap).
arrays_zip: Zip arrays into array of structs (PySpark arrays_zip).
asc: Ascending sort, nulls first (Spark default for ASC).
asc_nulls_first: Ascending sort, nulls first.
asc_nulls_last: Ascending sort, nulls last.
ascii: ASCII value of first character (PySpark ascii). Returns Int32.
asin: Arc sine (PySpark asin)
asinh: Inverse hyperbolic sine (PySpark asinh).
assert_true: Assert that all boolean values are true; errors otherwise (PySpark assert_true). When err_msg is Some, it is used in the error message when assertion fails.
atan: Arc tangent (PySpark atan)
atan2: Two-argument arc tangent atan2(y, x) in radians (PySpark atan2)
atanh: Inverse hyperbolic tangent (PySpark atanh).
avg: Average aggregation
base64: Base64 encode string bytes (PySpark base64).
bin: Convert integer to binary string (PySpark bin).
bit_and: Bitwise AND of two integer/boolean columns (PySpark bit_and).
bit_count: Count of set bits in the integer representation (PySpark bit_count).
bit_get: Alias for getbit (PySpark bit_get).
bit_length: Bit length of string (bytes * 8) (PySpark bit_length).
bit_or: Bitwise OR of two integer/boolean columns (PySpark bit_or).
bit_xor: Bitwise XOR of two integer/boolean columns (PySpark bit_xor).
bitmap_bit_position: Map integral value (0–32767) to bit position for bitmap aggregates (PySpark bitmap_bit_position).
bitmap_bucket_number: Bucket number for distributed bitmap (PySpark bitmap_bucket_number). value / 32768.
bitmap_construct_agg: Aggregate: bitwise OR of bit positions into one bitmap binary (PySpark bitmap_construct_agg). Use in group_by(…).agg([bitmap_construct_agg(col)]).
bitmap_count: Count set bits in a bitmap binary column (PySpark bitmap_count).
bitmap_or_agg: Aggregate: bitwise OR of bitmap binary column (PySpark bitmap_or_agg).
bitwise_not: Bitwise NOT of an integer/boolean column (PySpark bitwise_not / bitwiseNOT).
bool_and: Boolean AND across group (PySpark bool_and). Use in groupBy.agg(); column should be boolean.
bround: Banker’s rounding - round half to even (PySpark bround).
btrim: Trim leading and trailing chars (PySpark btrim). trim_str defaults to whitespace.
call_udf: Call a registered UDF by name. PySpark: F.call_udf(udfName, *cols). Requires a session (set by get_or_create). Raises if UDF not found.
cardinality: Cardinality: number of elements in array (PySpark cardinality). Alias for size/array_size.
cast: Cast column to the given type (PySpark cast). Fails on invalid conversion. String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean. String-to-date accepts date and datetime strings (e.g. “2025-01-01 10:30:00” truncates to date) for Spark parity.
cbrt: Cube root (PySpark cbrt).
ceil: Ceiling (PySpark ceil)
ceiling: Alias for ceil. PySpark ceiling.
char: Int to single-character string (PySpark char). Valid codepoint only.
char_length: Length of string in characters (PySpark char_length). Alias of length().
character_length: Length of string in characters (PySpark character_length). Alias of length().
chr: Alias for char (PySpark chr).
coalesce: Coalesce - returns the first non-null value from multiple columns.
col: Get a column by name
collect_list: Collect column values into list per group (PySpark collect_list). Use in groupBy.agg().
collect_set: Collect distinct column values into list per group (PySpark collect_set). Use in groupBy.agg().
concat: Concatenate string columns without separator (PySpark concat)
concat_ws: Concatenate string columns with separator (PySpark concat_ws)
contains: True if string contains substring (literal). PySpark contains.
conv: Base conversion (PySpark conv). num from from_base to to_base.
convert_timezone: Convert timestamp between timezones (PySpark convert_timezone).
corr: Pearson correlation aggregation (PySpark corr). Module-level; use in groupBy.agg() with two columns.
corr_expr: Pearson correlation aggregation (PySpark corr). Returns Expr for use in groupBy.agg().
cos: Cosine in radians (PySpark cos)
cosh: Hyperbolic cosine (PySpark cosh).
cot: Cotangent: 1/tan (PySpark cot).
count: Count aggregation
count_distinct: Count distinct aggregation (PySpark countDistinct)
count_if: Count rows where condition is true (PySpark count_if). Use in groupBy.agg(); column should be boolean (true=1, false=0).
covar_pop: Population covariance aggregation (PySpark covar_pop). Module-level; use in groupBy.agg() with two columns.
covar_pop_expr: Population covariance aggregation (PySpark covar_pop). Returns Expr for use in groupBy.agg().
covar_samp_expr: Sample covariance aggregation (PySpark covar_samp). Returns Expr for use in groupBy.agg().
crc32: CRC32 of string bytes (PySpark crc32). Not implemented: requires element-wise UDF.
create_map: Build a map column from alternating key/value expressions (PySpark create_map). Returns List(Struct{key, value}) using Polars as_struct and concat_list. With no args (or empty slice), returns a column of empty maps per row (PySpark parity #275).
csc: Cosecant: 1/sin (PySpark csc).
cume_dist: Cumulative distribution in partition: row_number / count. Window is applied.
curdate: Alias for current_date (PySpark curdate).
current_catalog: Current catalog name stub (PySpark current_catalog).
current_database: Current database/schema name stub (PySpark current_database).
current_date: Current date (evaluation time). PySpark current_date.
current_schema: Current schema name stub (PySpark current_schema).
current_timestamp: Current timestamp (evaluation time). PySpark current_timestamp.
current_timezone: Current session timezone (PySpark current_timezone). Default “UTC”. Returns literal column.
current_user: Current user stub (PySpark current_user).
date_add: Add n days to date column (PySpark date_add).
date_diff: Alias for datediff (PySpark date_diff). date_diff(end, start).
date_format: Format date/datetime as string (PySpark date_format). Accepts PySpark/Java SimpleDateFormat style (e.g. “yyyy-MM”) and converts to chrono strftime internally.
date_from_unix_date: Days since epoch to date (PySpark date_from_unix_date).
date_part: Alias for extract (PySpark date_part).
date_sub: Subtract n days from date column (PySpark date_sub).
date_trunc: Alias for trunc (PySpark date_trunc).
dateadd: Alias for date_add (PySpark dateadd).
datediff: Number of days between two date columns (PySpark datediff).
datepart: Alias for extract (PySpark datepart).
day: Extract day of month from datetime column (PySpark day)
dayname: Weekday name “Mon”,“Tue”,… (PySpark dayname).
dayofmonth: Alias for day. PySpark dayofmonth.
dayofweek: Extract day of week: 1=Sunday..7=Saturday (PySpark dayofweek).
dayofyear: Extract day of year (1-366) (PySpark dayofyear).
days: Interval of n days (PySpark days). For use in date_add, timestampadd, etc.
decode: Decode binary (hex string) to string (PySpark decode). Charset: UTF-8.
degrees: Convert radians to degrees (PySpark degrees)
dense_rank: Dense rank window function (no gaps). Use with .over(partition_by).
desc: Descending sort, nulls last (Spark default for DESC).
desc_nulls_first: Descending sort, nulls first.
desc_nulls_last: Descending sort, nulls last.
e: Constant e = 2.718… (PySpark e).
element_at: Get element at 1-based index (PySpark element_at).
elt: Return column at 1-based index (PySpark elt). elt(2, a, b, c) returns b.
encode: Encode string to binary (PySpark encode). Charset: UTF-8. Returns hex string.
endswith: True if string ends with suffix (PySpark endswith).
equal_null: Null-safe equality: true if both null or both equal (PySpark equal_null). Alias for eq_null_safe.
every: Alias for bool_and (PySpark every). Use in groupBy.agg().
exp: Exponential (PySpark exp)
explode: Explode list into one row per element (PySpark explode).
explode_outer: Explode; null/empty yields one row with null (PySpark explode_outer).
expm1: exp(x) - 1 (PySpark expm1).
extract: Extract field from date/datetime (PySpark extract). field: year, month, day, hour, minute, second, quarter, week, dayofweek, dayofyear.
factorial: Factorial n! (PySpark factorial). n in 0..=20; null for negative or overflow.
find_in_set: 1-based index of str in comma-delimited set (PySpark find_in_set). 0 if not found or str contains comma.
first: First value in group (PySpark first). Use in groupBy.agg(). ignorenulls: when true, first non-null; Polars 0.45 uses .first() only (ignorenulls reserved for API compatibility).
first_value: First value in partition (PySpark first_value). Use with .over(partition_by).
floor: Floor (PySpark floor)
format_number: Format numeric as string with fixed decimal places (PySpark format_number).
format_string: Printf-style format (PySpark format_string). Supports %s, %d, %i, %f, %g, %%.
from_csv: Parse CSV string to struct (PySpark from_csv). Minimal implementation.
from_json: Parse string column as JSON into struct (PySpark from_json).
from_unixtime: Convert seconds since epoch to formatted string (PySpark from_unixtime).
from_utc_timestamp: Interpret timestamp as UTC, convert to tz (PySpark from_utc_timestamp).
get: Get value for key from map, or null (PySpark get).
get_json_object: Extract JSON path from string column (PySpark get_json_object).
getbit: Get bit at 0-based position (PySpark getbit).
greatest: Greatest of the given columns per row (PySpark greatest). Uses element-wise UDF.
grouping: Grouping set marker (PySpark grouping). Stub: returns 0 (no GROUPING SETS in robin-sparkless).
grouping_id: Grouping set id (PySpark grouping_id). Stub: returns 0.
hash: Hash of column values (PySpark hash). Uses Murmur3 32-bit for parity with PySpark.
hex: Convert to hex string (PySpark hex).
hour: Extract hour from datetime column (PySpark hour).
hours: Interval of n hours (PySpark hours).
hypot: sqrt(xx + yy) (PySpark hypot).
ifnull: Alias for nvl. PySpark ifnull.
ilike: Case-insensitive LIKE. PySpark ilike. When escape_char is Some(esc), esc + char treats that char as literal.
initcap: Title case (PySpark initcap)
inline: Explode list of structs into rows; struct fields become columns after unnest (PySpark inline). Returns the exploded struct column; use unnest to expand struct fields to columns.
inline_outer: Like inline but null/empty yields one row of nulls (PySpark inline_outer).
input_file_name: Stub input file name - empty string (PySpark input_file_name).
instr: Find substring position 1-based; 0 if not found (PySpark instr).
isin: Check if column values are in the given list (PySpark isin). Uses Polars is_in.
isin_i64: Check if column values are in the given i64 slice (PySpark isin with literal list).
isin_str: Check if column values are in the given string slice (PySpark isin with literal list).
isnan: True where the float value is NaN (PySpark isnan).
isnotnull: True if column is not null. PySpark isnotnull.
isnull: True if column is null. PySpark isnull.
json_array_length: Length of JSON array at path (PySpark json_array_length).
json_object_keys: Keys of JSON object (PySpark json_object_keys). Returns list of strings.
json_tuple: Extract keys from JSON as struct (PySpark json_tuple). keys: e.g. [“a”, “b”].
kurtosis: Kurtosis aggregation (PySpark kurtosis). Fisher definition, bias=true. Use in groupBy.agg().
lag: Lag: value from n rows before in partition. Use with .over(partition_by).
last_day: Last day of month for date column (PySpark last_day).
last_value: Last value in partition (PySpark last_value). Use with .over(partition_by).
lcase: Alias for lower. PySpark lcase.
lead: Lead: value from n rows after in partition. Use with .over(partition_by).
least: Least of the given columns per row (PySpark least). Uses element-wise UDF.
left: Leftmost n characters (PySpark left).
length: String length in characters (PySpark length)
levenshtein: Levenshtein distance (PySpark levenshtein). Not implemented: requires element-wise UDF.
like: SQL LIKE pattern (% any, _ one char). PySpark like. When escape_char is Some(esc), esc + char treats that char as literal.
lit_bool
lit_f64
lit_i32: Create a literal column from a value
lit_i64
lit_null: Typed null literal column. Returns Err on unknown type name. See parse_type_name for supported type strings (e.g. "boolean", "string", "bigint").
lit_str
ln: Alias for log (natural log). PySpark ln.
localtimestamp: Alias for current_timestamp (PySpark localtimestamp).
locate: Find substring position 1-based, starting at pos (PySpark locate). 0 if not found.
log: Natural logarithm (PySpark log with one arg)
log2: Base-2 log (PySpark log2).
log1p: log(1 + x) (PySpark log1p).
log10: Base-10 log (PySpark log10).
log_with_base: Logarithm with given base (PySpark log(col, base)). base must be positive and not 1.
lower: Convert string column to lowercase (PySpark lower)
lpad: Left-pad string to length with pad char (PySpark lpad).
ltrim: Trim leading whitespace (PySpark ltrim)
make_date: Build date from year, month, day columns (PySpark make_date).
make_dt_interval: Day-time interval: days, hours, minutes, seconds (PySpark make_dt_interval). All optional; 0 for omitted.
make_interval: Create interval duration (PySpark make_interval). Optional args; 0 for omitted.
make_timestamp: make_timestamp(year, month, day, hour, min, sec, timezone?) - six columns to timestamp (PySpark make_timestamp). When timezone is Some(tz), components are interpreted as local time in that zone, then converted to UTC.
make_timestamp_ntz: Alias for make_timestamp (PySpark make_timestamp_ntz - no timezone).
make_ym_interval: Year-month interval (PySpark make_ym_interval). Polars has no native YM type; return months as Int32 (years*12 + months).
map_concat: Merge two map columns (PySpark map_concat). Last value wins for duplicate keys.
map_contains_key: True if map contains key (PySpark map_contains_key).
map_entries: Return map as list of structs {key, value} (PySpark map_entries).
map_filter: Filter map entries by predicate (PySpark map_filter).
map_filter_value_gt: Convenience: map_filter with value > threshold predicate.
map_from_arrays: Build map from two array columns keys and values (PySpark map_from_arrays). Implemented via UDF.
map_from_entries: Array of structs {key, value} to map (PySpark map_from_entries).
map_keys: Extract keys from a map column (PySpark map_keys). Map is List(Struct{key, value}).
map_values: Extract values from a map column (PySpark map_values).
map_zip_with: Merge two maps by key with merge function (PySpark map_zip_with).
map_zip_with_coalesce: Convenience: map_zip_with with coalesce(value1, value2) merge.
mask: Mask string: replace upper/lower/digit/other with given chars (PySpark mask).
max: Maximum aggregation
max_by: Value of value_col in the row where ord_col is maximum (PySpark max_by). Use in groupBy.agg().
md5: MD5 hash of string bytes, return hex string (PySpark md5).
mean: Alias for avg (PySpark mean).
median: Median aggregation. PySpark median.
min: Minimum aggregation
min_by: Value of value_col in the row where ord_col is minimum (PySpark min_by). Use in groupBy.agg().
minute: Extract minute from datetime column (PySpark minute).
minutes: Interval of n minutes (PySpark minutes).
mode: Mode aggregation - most frequent value. PySpark mode.
monotonically_increasing_id: Stub monotonically_increasing_id - constant 0 (PySpark monotonically_increasing_id). Note: differs from PySpark which is unique per-row; see PYSPARK_DIFFERENCES.md.
month: Extract month from datetime column (PySpark month)
months: Interval of n months (PySpark months). Approximated as 30*n days.
months_between: Months between end and start dates as fractional (PySpark months_between). When round_off is true, rounds to 8 decimal places (PySpark default).
named_struct: Create struct with explicit field names (PySpark named_struct). Pairs of (name, column).
nanvl: Replace NaN with value. PySpark nanvl.
negate: Unary minus / negate (PySpark negate, negative).
negative: Alias for negate. PySpark negative.
next_day: Next date that is the given weekday (e.g. “Mon”) (PySpark next_day).
now: Alias for current_timestamp (PySpark now).
nth_value: Nth value in partition by order (1-based n). Window is applied; do not call .over() again.
ntile: Ntile: bucket 1..n by rank within partition. Window is applied.
nullif: Return null if column equals value, else column. PySpark nullif.
nvl: Alias for coalesce(col, value). PySpark nvl / ifnull.
nvl2: Three-arg null replacement: if col1 is not null then col2 else col3. PySpark nvl2.
octet_length: Length of string in bytes (PySpark octet_length).
overlay: Replace substring at 1-based position (PySpark overlay). replace is literal.
parse_type_name: Parse PySpark-like type name to Polars DataType. Decimal(precision, scale) is mapped to Float64 for schema parity (Polars dtype-decimal not enabled).
parse_url: Parse URL and extract part: PROTOCOL, HOST, PATH, etc. (PySpark parse_url). When key is Some(k) and part is QUERY/QUERYSTRING, returns the value for that query parameter only.
percent_rank: Percent rank in partition: (rank - 1) / (count - 1). Window is applied.
percentile_approx: Approximate percentile (PySpark percentile_approx). Alias for approx_percentile.
pi: Constant pi = 3.14159… (PySpark pi).
pmod: Positive modulus (PySpark pmod).
posexplode: Explode list with position (PySpark posexplode). Returns (pos_column, value_column). pos is 1-based; implemented via list.eval(cum_count()).explode() and explode().
posexplode_outer: Posexplode with null preservation (PySpark posexplode_outer).
position: Position of substring in column (PySpark position). Same as instr; (substr, col) argument order.
positive: Unary plus - no-op, returns column as-is (PySpark positive).
pow: Power (PySpark pow)
power: Alias for pow. PySpark power.
printf: Alias for format_string (PySpark printf).
quarter: Extract quarter (1-4) from date/datetime (PySpark quarter).
radians: Convert degrees to radians (PySpark radians)
raise_error: Raise an error when evaluated (PySpark raise_error). Always fails with the given message.
rand: Random uniform [0, 1) per row, with optional seed (PySpark rand). When added via with_column, generates one distinct value per row (PySpark-like).
randn: Random standard normal per row, with optional seed (PySpark randn). When added via with_column, generates one distinct value per row (PySpark-like).
rank: Rank window function (ties same rank, gaps). Use with .over(partition_by).
regexp: Alias for rlike (PySpark regexp).
regexp_count: Count of non-overlapping regex matches (PySpark regexp_count).
regexp_extract: Extract first match of regex (PySpark regexp_extract). group_index 0 = full match.
regexp_extract_all: Extract all matches of regex (PySpark regexp_extract_all).
regexp_instr: 1-based position of first regex match (PySpark regexp_instr).
regexp_like: Check if string matches regex (PySpark regexp_like / rlike).
regexp_replace: Replace first match of regex (PySpark regexp_replace)
regexp_substr: First substring matching regex (PySpark regexp_substr). Null if no match.
regr_avgx_expr: Regression: average of x (PySpark regr_avgx).
regr_avgy_expr: Regression: average of y (PySpark regr_avgy).
regr_count_expr: Regression: count of (y, x) pairs where both non-null (PySpark regr_count).
regr_intercept_expr: Regression intercept: avg_y - slope*avg_x (PySpark regr_intercept).
regr_r2_expr: Regression R-squared (PySpark regr_r2).
regr_slope_expr: Regression slope: cov_samp(y,x)/var_samp(x) (PySpark regr_slope).
regr_sxx_expr: Regression: sum((x - avg_x)^2) (PySpark regr_sxx).
regr_sxy_expr: Regression: sum((x - avg_x)(y - avg_y)) (PySpark regr_sxy).
regr_syy_expr: Regression: sum((y - avg_y)^2) (PySpark regr_syy).
repeat: Repeat string n times (PySpark repeat).
replace: Replace all occurrences of search with replacement (literal). PySpark replace.
reverse: Reverse string (PySpark reverse).
right: Rightmost n characters (PySpark right).
rint: Round to nearest integer (PySpark rint).
rlike: Alias for regexp_like. PySpark rlike / regexp.
round: Round (PySpark round)
row_number: Row number window function (1, 2, 3 by order within partition). Use with .over(partition_by) after ranking by an order column.
rpad: Right-pad string to length with pad char (PySpark rpad).
rtrim: Trim trailing whitespace (PySpark rtrim)
schema_of_csv: Schema of CSV string (PySpark schema_of_csv). Returns literal schema string; minimal stub.
schema_of_json: Schema of JSON string (PySpark schema_of_json). Returns literal schema string; minimal stub.
sec: Secant: 1/cos (PySpark sec).
second: Extract second from datetime column (PySpark second).
sequence: Generate array of numbers from start to stop (inclusive) with optional step (PySpark sequence). step defaults to 1.
sha1: SHA1 hash of string bytes, return hex string (PySpark sha1).
sha2: SHA2 hash; bit_length 256, 384, or 512 (PySpark sha2).
shift_left: Bitwise left shift (PySpark shiftLeft). col << n.
shift_right: Bitwise signed right shift (PySpark shiftRight). col >> n.
shift_right_unsigned: Bitwise unsigned right shift (PySpark shiftRightUnsigned). Logical shift for Long.
shuffle: Random permutation of list elements (PySpark shuffle).
sign: Alias for signum (PySpark sign).
signum: Sign of the number: -1, 0, or 1 (PySpark signum)
sin: Sine in radians (PySpark sin)
sinh: Hyperbolic sine (PySpark sinh).
size: Alias for array_size (PySpark size).
skewness: Skewness aggregation (PySpark skewness). bias=true. Use in groupBy.agg().
soundex: Soundex code (PySpark soundex). Not implemented: requires element-wise UDF.
spark_partition_id: Stub partition id - always 0 (PySpark spark_partition_id).
split: Split string by delimiter (PySpark split). Optional limit: at most that many parts (remainder in last).
split_part: Split by delimiter and return 1-based part (PySpark split_part).
sqrt: Square root (PySpark sqrt)
stack: Stack columns into struct (PySpark stack). Alias for struct_.
startswith: True if string starts with prefix (PySpark startswith).
std: Alias for stddev (PySpark std).
stddev: Standard deviation (sample) aggregation (PySpark stddev / stddev_samp)
stddev_pop: Population standard deviation (ddof=0). PySpark stddev_pop.
stddev_samp: Sample standard deviation (ddof=1). Alias for stddev. PySpark stddev_samp.
str_to_map: Parse string to map (PySpark str_to_map). Default delims: “,” and “:”.
struct_: Create struct from columns using column names as field names (PySpark struct).
substr: Alias for substring. PySpark substr.
substring: Substring with 1-based start (PySpark substring semantics)
substring_index: Substring before/after nth delimiter (PySpark substring_index).
sum: Sum aggregation
tan: Tangent in radians (PySpark tan)
tanh: Hyperbolic tangent (PySpark tanh).
timestamp_micros: Convert microseconds since epoch to timestamp (PySpark timestamp_micros).
timestamp_millis: Convert milliseconds since epoch to timestamp (PySpark timestamp_millis).
timestamp_seconds: Convert seconds since epoch to timestamp (PySpark timestamp_seconds).
timestampadd: Add amount of unit to timestamp (PySpark timestampadd).
timestampdiff: Difference between timestamps in unit (PySpark timestampdiff).
to_binary: Convert to binary (PySpark to_binary). fmt: ‘utf-8’, ‘hex’.
to_char: Cast to string, optionally with format for datetime (PySpark to_char, to_varchar). When format is Some, uses date_format for datetime columns (PySpark format → chrono strftime); otherwise cast to string. Returns Err if the cast to string fails (invalid type name or unsupported column type).
to_csv: Format struct as CSV string (PySpark to_csv). Minimal implementation.
to_date: Cast or parse to date (PySpark to_date). When format is None: cast date/datetime to date, parse string with default formats. When format is Some: parse string with given format.
to_degrees: Alias for degrees. PySpark toDegrees.
to_json: Serialize struct column to JSON string (PySpark to_json).
to_number: Cast to numeric (PySpark to_number). Uses Double. Format parameter reserved for future use. Returns Err if the cast to double fails (invalid type name or unsupported column type).
to_radians: Alias for radians. PySpark toRadians.
to_timestamp: Cast to timestamp, or parse with format when provided (PySpark to_timestamp). When format is None, parses string columns with default format “%Y-%m-%d %H:%M:%S” (PySpark parity #273).
to_timestamp_ltz: Parse as timestamp in local timezone, return UTC (PySpark to_timestamp_ltz).
to_timestamp_ntz: Parse as timestamp without timezone (PySpark to_timestamp_ntz). Returns Datetime(_, None).
to_unix_timestamp: Alias for unix_timestamp.
to_utc_timestamp: Interpret timestamp as in tz, convert to UTC (PySpark to_utc_timestamp).
to_varchar: Alias for to_char (PySpark to_varchar).
transform_keys: Transform map keys by expr (PySpark transform_keys).
transform_values: Transform map values by expr (PySpark transform_values).
translate: Character-by-character translation (PySpark translate).
trim: Trim leading and trailing whitespace (PySpark trim)
trunc: Truncate date/datetime to unit (PySpark trunc).
try_add: Add that returns null on overflow (PySpark try_add). Uses checked arithmetic.
try_aes_decrypt: Try AES decrypt (PySpark try_aes_decrypt). Returns null on failure.
try_avg: Average aggregation; null on invalid (PySpark try_avg). Use in groupBy.agg(). Maps to mean; reserved for API.
try_cast: Cast column to the given type, returning null on invalid conversion (PySpark try_cast). String-to-boolean uses custom parsing (“true”/“false”/“1”/“0”) since Polars does not support Utf8->Boolean. String-to-date accepts date and datetime strings; invalid strings become null.
try_divide: Division that returns null on divide-by-zero (PySpark try_divide).
try_element_at: Element at index, null if out of bounds (PySpark try_element_at). Same as element_at for lists.
try_multiply: Multiply that returns null on overflow (PySpark try_multiply).
try_subtract: Subtract that returns null on overflow (PySpark try_subtract).
try_sum: Sum aggregation; null on overflow (PySpark try_sum). Use in groupBy.agg(). Polars sum does not overflow; reserved for API.
try_to_binary: Try convert to binary; null on failure (PySpark try_to_binary).
try_to_number: Cast to numeric, null on invalid (PySpark try_to_number). Format parameter reserved for future use. Returns Err if the try_cast setup fails (invalid type name); column values that cannot be parsed become null.
try_to_timestamp: Cast to timestamp, null on invalid, or parse with format when provided (PySpark try_to_timestamp). When format is None, parses string columns with default format (null on invalid). #273
typeof_: Data type of column as string (PySpark typeof). Constant per column from schema.
ucase: Alias for upper. PySpark ucase.
unbase64: Base64 decode to string (PySpark unbase64). Invalid decode → null.
unhex: Convert hex string to binary/string (PySpark unhex).
unix_date: Date to days since 1970-01-01 (PySpark unix_date).
unix_micros: Timestamp to microseconds since epoch (PySpark unix_micros).
unix_millis: Timestamp to milliseconds since epoch (PySpark unix_millis).
unix_seconds: Timestamp to seconds since epoch (PySpark unix_seconds).
unix_timestamp: Parse string timestamp to seconds since epoch (PySpark unix_timestamp). format defaults to yyyy-MM-dd HH:mm:ss.
unix_timestamp_now: Current Unix timestamp in seconds (PySpark unix_timestamp with no args).
upper: Convert string column to uppercase (PySpark upper)
url_decode: Percent-decode URL-encoded string (PySpark url_decode).
url_encode: Percent-encode string for URL (PySpark url_encode).
user: User stub (PySpark user).
var_pop: Population variance (ddof=0). PySpark var_pop.
var_samp: Sample variance (ddof=1). Alias for variance. PySpark var_samp.
variance: Variance (sample) aggregation (PySpark variance / var_samp)
version: Session/library version string (PySpark version).
weekday: Weekday 0=Mon, 6=Sun (PySpark weekday).
weekofyear: Extract ISO week of year (1-53) (PySpark weekofyear).
when: PySpark-style conditional expression builder.
when_then_otherwise_null: Two-arg when(condition, value): returns value where condition is true, null otherwise (PySpark when(cond, val)).
width_bucket: Assign value to histogram bucket (PySpark width_bucket). Returns 0 if v < min_val, num_bucket+1 if v >= max_val.
xxhash64: XXH64 hash (PySpark xxhash64). Not implemented: requires element-wise UDF.
year: Extract year from datetime column (PySpark year)
years: Interval of n years (PySpark years). Approximated as 365*n days.
zip_with: Zip two arrays element-wise with merge function (PySpark zip_with).
zip_with_coalesce: Convenience: zip_with with coalesce(left, right) merge.

Module functions

Module functions Copy item path

Structs§

Functions§

Module functions